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ABSTRACT 


In  this  thesis,  we  present  two  lossless  compression  approaches.  Our  Rotational  Tree 
Approach  (RTA)  is  based  upon  mathematics  developed  by  Fredricksen.  RTA  uses  the 
rotations  associated  with  binary  necklace  classes  to  disperse  source  bit  strings  to  a  forest 
of  Huffman  encoding  trees.  Our  Indexed  Tree  Approach  (IT A)  also  uses  a  Huffinan 
forest,  but  disperses  bit  strings  via  a  simpler  mechanism  based  upon  the  first  few  bits  of 
each  string.  For  text  compression,  we  find  RTA  to  be  competitive  with  standard 
Huffinan  encoding  while  ITA  is  generally  superior  by  a  small  margin  of  1%  -  3%.  Both 
approaches  owe  their  (limited)  success  to  decreased  modeling  overhead  as  compared  to 
standard  Huffinan  encoding.  Compression  results  against  the  Canterbury  Corpiis  test  suit 
and  complete  Java  implementation  code  are  included  as  appendices. 
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I.  INTRODUCTION 


A  search  for  the  string  ‘Huffman  Encoding’  in  the  academic  science  and 
engineering  database  Ei  Compendex  Web  [1]  yields  the  following  results: 

•  1970’s  database  -  27  papers  found 

•  1980’s  database  -  61  papers  found 

•  1990’s  database  -  305  papers  found 

Clearly,  despite  the  passing  of  nearly  50  years  since  D.A.  Huffinan’s  [2]  landmark 
work  on  lossless  compression,  new  uses  are  still  being  found  for  his  ideas  at  ever 
increasing  rates.  The  purpose  of  this  paper  is  to  present  two  previously  untried 
approaches  to  lossless  compression  both  of  which  involve  the  notion  of  multiple 
HufiBman  encoding  trees. 

■Our  Rotational  Tree  Approach  (RTA)  is  based  upon  mathematics  developed  by 
Fredricksen  [3, 4].  RTA  uses  the  rotations  associated  with  binary  necklace  classes  to 
disperse  source  bit  strings  to  a  forest  of  Huffman  encoding  trees.  Our  Indexed  Tree 
Approach  (IT A)  also  uses  a  Huffinan  forest,  but  disperses  bit  strings  via  a  simpler 
mechanism  based  upon  the  first  few  bits  of  each  string.  For  text  compression,  we  find 
RTA  to  be  very  competitive  with  standard  Huffinan  encoding  while  ITA  is  generally 
superior  by  a  margin  of  1  %  -  3%. 
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II.  COMPRESSION  FOUNDATIONS 


A.  COMPRESSING /DECOMPRESSING 

In  general,  lossless  data  compression  consists  of  reading  a  stream  of  symbols 
from  file  A,  replacing  each  symbol  with  some  (hopefully  shorter)  binary  code,  and 
writing  the  new  codes  to  some  file  B.  File  B  is  now  a  compressed  version  of  file  A  (or 
expanded,  if  the  method  was  ineffective).  The  decompression  process  is  the  inverse.  The 
codes  stored  in  file  B  are  read,  converted  to  the  original  symbols,  and  written  to  some 
new  file  C.  Upon  successful  completion  of  a  lossless  compression  /  decompression 
cycle,  file  C  is  identical  to  file  A. 

B.  LOSSY  VS.  LOSSLESS 

Data  compression  techniques  can  be  divided  into  two  broad  categories  -  lossy  and 
lossless.  As  the  name  implies,  lossy  compression  techniques  sacrifice  some  of  the 
information  content  of  the  source  file.  Lossless  compression  techniques,  on  the  other 
hand,  preserve  all  the  information  content  of  the  original  file.  Lossless  techniques 
demand  that  the  uncompressed  file  be  bit  for  bit  identical  with  the  original  source  file. 

Lossy  compression  is  often  used  on  files  containing  digitized  voice  or  image  data. 
Lossy  compression  makes  sense  in  these  settings  since  digital  audio  and  video  data  files 
are  always  truncations  of  their  original  analog  sources.  Since  one  doesn’t  have  all  the 
data  to  begin  with,  it  is  often  acceptable  to  sacrifice  a  bit  more  for  the  sake  of 
compression  (so  long  as  the  loss  is  not  readily  noticeable  to  the  end  user).  Popular  lossy 
compression  schemes  include  JPEG  (for  images),  MPEG  (for  video),  and  MP3  (for 
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audio).  Lossy  compression,  while  both  useful  and  interesting,  is  not  the  subject  of  this 
thesis  and  will  not  be  discussed  further. 

Lossless  methods  are  used  to  compress  documents,  database  records,  executable 
files,  and  any  other  form  of  data  that  must  be  reproduced  exactly.  The  two  compression 
algorithms  discussed  in  this  paper  are  lossless. 

C.  INFORMATION  THEORY 

The  theory  of  information  developed  by  Claude  Shannon  [5]  in  the  1940’s 
concerns  the  study  of  the  storage,  processing,  and  transmission  of  information. 
Information  Theory  provides  an  objective  mathematical  way  of  measuring  the 
information  content  of  a  particular  symbol  (within  a  given  message)  and  of  the  whole 
message.  Information  content  is  referred  to  as  entropy,  and  is  generally  measured  in  bits. 
The  entropy  E  of  an  input  symbol  i  can  be  calculated  using 

E(i)  =  log,(l/p(i)), 

Equation  1 

where  p(i)  is  the  probability  of  i  occurring  in  the  message.  The  entropy  H  of  an  entire 
message  (typicall>  a  file)  is  then  the  sum  of  the  entropies  of  all  the  individual  symbols  in 
the  message,  or 

H  =  I(lo&(l/p(i)) 

Equation  2 

summed  over  all  the  symbols  in  the  message. 
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Consider  the  frequency  distribution  of  characters  in  a  simple  100-character  message: 


Characters  Found 

A 

G 

F 

Frequency 

45 

16 

13 

12 

9 

5 

Entropy 

1.15 

2.64 

2.94 

3.06 

3.47 

4.32 

Table  1 

The  total  entropy  is  H  =  (1.15  *  45)  +  (2.64  *  16)  +  (2.94  *  13)  +  (3.06  *  12) 

+  (3.47  *  9)  +  (4.32  *  5)  =  221.76  bits.  Shannon’s  equations  predict  that  under  ideal 
circumstances  we  should  be  able  to  represent  the  above  message  in  222  bits.  We  contrast 
these  statements  with  traditional  ASCII  encoding,  which  uses  8  bits  per  character  or 
8  *  100  =  800  bits  to  represent  the  data  in  the  target  file.  One  can  immediately  see  how 
much  room  for  improvement  via  compression  actually  exists  in  a  typical  text  file.  Thus, 
Shaimon’s  methods  provide  us  with  a  lower  bound  with  which  to  measure  the 
effectiveness  of  any  compression  scheme. 

D.  COMPRESSION  STATISTICS 

There  are  many  ways  to  express  the  size  reduction  of  a  file  after  it  has  been 
compressed.  In  this  paper  we  define  the  percentage  of  decrease  (POD)  via  the 
formula 

POD  =  {change  in  file  size  /  original  file  size)  *  100, 

Equation  3 

where  the  change  in  file  size  =  original  file  size  -  compressed  file  size. 

In  the  event  that  the  compression  fails  (bloats  the  file)  the  resulting  POD  will  be 
negative. 
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E. 


STATISTICAL  VS.  DICTIONARY  MODELING 


A  compression  model  is  a  collection  of  data  and  rules  used  to  map  each  input 
symbol  to  an  output  code.  The  term  symbol  is  intentionally  vague,  as  a  symbol  can 
represent  an  arbitrarily  long  string  of  bits  deemed  useful  by  the  chosen  model.  A 
compression  program  uses  its  model  to  accurately  define  the  probabilities  for  each  input 
symbol.  The  model  allows  the  program  to  produce  an  appropriate  code  for  each  symbol. 
Lossless  compression  is  generally  implemented  using  either  statistical  or  dictionary 
modeling. 

In  statistical  modeling  the  compression  program  analyzes  the  source  file  and  finds 
the  most  common  ssmbols.  Typically,  statistical  modeling  uses  fixed-length  symbols. 
Then  each  input  ssmbol  is  assigned  a  replacement  code.  Replacement  codes  are 
normally  short  for  the  more  common  symbols  and  longer  for  the  less  common  ones.  An 
output  file  is  then  built  b>  representing  each  symbol  in  the  source  file  by  its  replacement 
code.  When  all  the  replacements  are  made  the  output  file  should  be  shorter  than  the 
source  file  since  the  more  common  symbols  all  get  shorter  representations.  Even  though 
some  s)mbols  get  longer  representations,  they  occur  only  infrequently  in  the  text,  thus 
they  have  a  limited  effect  on  the  output  file  length. 

In  dictionar>  modeling  a  single  code  is  used  to  replace  a  string  of  fixed-length 
symbols.  Equivalently,  one  could  say  that  symbols  are  of  variable  length.  Regardless  of 
the  interpretation,  a  program  using  a  dictionary  model  merely  builds  a  list  of  the  most 
frequent  words,  phrases,  bit  strings,  or  input  symbols  found  in  the  source  file.  Each  entry 
in  the  dictionary  is  then  assigned  a  fixed-length  replacement  code,  which  can  be  thought 
of  as  simply  an  index  into  the  dictionary. 
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The  compression  program  uses  its  dictionary  much  like  the  authors  of  a  thesis  use 
the  list  of  references  at  the  end  of  the  paper.  Instead  of  writing  out  the  full  citation  for  a 
referenced  paper,  a  number  is  used  to  indicate  the  citation’s  index  within  the 
bibliography.  The  reader  of  the  thesis  decompresses  the  citation  by  finding  the  indicated 
position  within  the  list  of  references  and  mentally  replaces  the  number  with  the  full 
citation.  Similarly,  the  dictionary  compression  program  replaces  input  symbols  with  their 
assigned  replacement  codes  (indices),  and  the  decompression  program  does  the  inverse. 

To  draw  a  distinction  between  statistical  and  dictionary  modeling,  note  that  in 
statistical  modeling  fixed-length  symbols  are  replaced  by  variable-length  codes,  but  in 
dictionary  modeling  variable-length  symbols  are  replaced  by  fixed-length  codes. 

F.  STATIC  VS.  ADAPTIVE  MODELS 

The  models  in  the  previous  section  were  described  in  a  ‘static’  way,  i.e.,  each 
model  was  created  only  after  the  entire  source  file  was  completely  analyzed.  This  means 
two  passes  through  the  source  file  are  required  -  the  first  to  build  the  model  and  the 
second  to  do  the  actual  symbol  replacement.  If  a  program  is  compressed  using  a  static 
model  then  the  decompression  program  must  also  have  access  to  the  same  model  in  order 
to  transform  the  codes  in  the  compressed  file  back  into  the  symbols  from  the  original 
source  file.  In  many  cases,  this  means  that  the  model  will  have  to  be  included  as  part  of 
the  compressed  file.  Model  overhead  information  is  normally  located  at  the  beginning  of 
the  compressed  file  in  an  area  referred  to  as  the  file  header.  Header  overhead  is  one  of 
the  major  performance  burdens  of  static  methods. 

An  alternative  to  static  modehng  is  adaptive  modeling  (sometimes  also  referred  to 

as  dynamic  modeling).  In  adaptive  modeling  the  compression  program  updates  the 
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model  as  the  output  file  is  being  created.  This  means  that  the  output  codes  produced  by 
the  compression  program  might  have  different  meaning  depending  upon  their  relative 
position  in  the  output  file. 


Figure  1 :  General  Adaptive  Compression  [6] 

There  are  several  advantages  to  adaptive  modeling.  First,  only  one  pass  is 
required  to  encode  the  source  file.  Second,  adaptive  techniques  quickly  tune  themselves 
to  the  file  they  are  encoding  and  are  thus  better  able  to  compress  files  containing 
markedly  different  types  of  data.  Third,  modeling  information  need  not  be  explicitly 
included  in  the  output  file  header  as  it  is  with  static  techniques.  Instead,  the  adaptive 
decompression  program  simply  begins  with  the  same  initial  model  as  the  compression 
program  did,  and  then  updates  its  model  each  time  it  reads  a  code  from  the  compressed 
file.  Thus,  although  the  decompression  program’s  model  is  in  a  constant  state  of  change, 
it  is  always  identical  to  the  compression  program’s  model  at  the  same  point  in  the 
process.  Obviously,  adaptive  techniques  rely  on  the  compression  and  decompression 
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programs  both  beginning  with  the  same  model,  and  both  updating  the  model  in  exactly 
the  same  fashion.  The  following  diagram  depicts  the  decompression  process. 


Codes 


Read  Input 
Code 


Decode 
^  Symbol 


Update 

Model 


► 


Output 

Symbol 


Symbols 


< 


Figure  2:  General  Adaptive  Decompression  [6] 

G.  SURVEY  OF  LOSSLESS  COMPRESSION  TECHNIQUES 

There  are  literally  hundreds  of  serious  data  compression  programs  available  on 
the  web.  The  interested  reader  is  referred  to  the  excellent  compression  resource,  “The 
Data  Compression  Library”  [7].  We  do  not  survey  these  compression  implementations. 
Instead,  we  present  the  foundational  techniques  from  which  the  current  generation  of 
compression  programs  has  sprung.  In  this  way  we  hope  to  paint  a  clear  picture  of  the 
field  of  compression,  and  to  show  where  our  methods  fit  into  the  field. 

A  word  of  warning  for  those  who  do  decide  to  explore  the  world  of  compression 
via  the  Internet:  The  vast  majority  of  commercial  applications  are  enhancements  and 
hybrids  of  the  more  basic  techniques,  which  follow  in  this  section.  These  applications  are 
typically  tuned  for  specific  data  or  specific  performance  (speed,  memory  footprint,  or  file 
size).  Few  limit  themselves  to  any  one  compression  technique.  Instead,  multiple 
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techniques  are  used  in  an  effort  to  squeeze  the  maximum  possible  compression  out  of 
source  files.  Naturally,  this  leads  to  increased  algorithm  and  program  complexity.  In 
short,  the  Internet  may  not  be  the  best  place  to  learn  the  fundamentals  of  compression. 
Two  excellent  books  on  the  subject  are  “The  Data  Compression  Book”  [6]  and 
“Compression  Algorithms  for  Real  Programmers”  [8]. 

Most  of  the  following  techniques  can  use  either  a  static  or  adaptive  (dynamic) 
model.  Interestingly,  most  statistical  approaches  seem  to  have  been  originally  conceived 
in  a  static  manner  and  later  explored  using  adaptive  methods.  On  the  other  hand, 
dictionary  techniques  typically  began  as  adaptive  methods,  and  only  later  began  to  be 
investigated  statically.  The  techniques  that  follow  are  presented  using  the  method  with 
which  they  were  originally  developed. 

1.  Run  Length 

Long  runs  of  repeated  characters  are  one  of  the  simplest  forms  of  redundancy 
found  in  a  file  [9, 10].  Consider  the  string  of  repeated  characters, 
AAAAAAAABBBCCCCCCCDDDDEEEEEEEEE. 

A  run  length  encoding  scheme  might  represent  the  string  as 
8A3B7C4D9E. 

Unfortunately,  typical  text  does  not  exhibit  these  types  of  patterns.  There  are 
some  applications  of  this  technique  -  the  runs  of  black  and  white  pixels  on  a  fax  image 
for  example,  but  in  general  it  is  not  particularly  suited  as  a  stand-alone  method.  Typical 
text  compression  using  run  length  encoding  is  less  then  5%.  Of  course,  greater 
compression  results  can  be  achieved  with  specific  data  (e.g.,  fax  images). 
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2. 


Shannon-Fano 


This  technique  uses  a  static  statistical  model  to  build  a  binary  tree  in  a  top-down 
fashion.  Each  leaf  of  the  tree  represents  an  input  symbol,  while  the  path  from  the  root 
node  to  a  leaf  gives  the  leaf  s  replacement  code.  The  replacement  codes  generated  by  the 
algorithm  share  the  following  properties: 

•  Codes  are  of  variable  length. 

•  High  probability  codes  are  shorter  then  low  probability  codes. 

•  No  code  is  a  prefix  of  any  other  code  from  the  same  tree. 

The  Shannon-Fano  algorithm  [1 1]  proceeds  through  the  following  steps: 

1 .  Build  a  frequency  distribution  for  the  input  symbols. 

2.  Sort  the  list  of  symbols  in  descending  order  by  frequency. 

3.  Initialize  the  replacement  code  for  each  symbol  to  be  null. 

4.  Divide  the  list  in  two  parts  with  the  total  frequency  count  of  the  upper  half 

being  as  close  as  possible  to  the  total  frequency  count  of  the  lower  half. 

5.  Append  a  0  to  the  replacement  code  of  each  symbol  in  the  upper  half  and  a  1  to 

the  replacement  code  of  each  symbol  in  the  lower  half. 

6.  Recursively  apply  steps  4  and  5  to  each  partial  list  until  each  list  contains 
exactly  one  symbol. 

Although  an  excellent  technique,  Shaimon-Fano  encoding  has  generally  been 
replaced  by  Huffinan  encoding,  which  is  marginally  superior. 

3.  Huffman 

Standard  Huffinan  encoding  [2]  uses  a  static  statistical  model  to  build  a  binary 
tree  in  a  bottom-up  fashion.  The  codes  generated  by  the  tree  share  all  the  properties  of 
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those  of  those  generated  by  Shannon-Fano  and  have  the  added  advantage  of  being 
provably  optimal.  Optimal  in  this  case  means  that  there  isn’t  a  better  set  of  integral- 
length  replacement  codes  than  those  generated  by  the  algorithm. 

The  Huffman  algorithm  proceeds  as  follows: 

1 .  Build  a  frequency  distribution  for  the  input  symbols. 

2.  Create  a  collection  of  single-node  trees  C  each  associated  with  a  symbol  and 
its  frequency. 


3.  Let  ti  and  t2  be  two  trees  with  the  lowest  frequency  in  Collection  C.  Replace  ti 
and  t2  with  a  single  tree  ts,  formed  by  attaching  ti  and  t2  as  the  left  and  right  descendants 
of  the  new  node.  The  frequency  associated  with  ta  is  the  sum  of  the  frequencies  of  ti  and 
t2- 


4.  While  more  than  one  tree  remains  in  C,  repeat  step  3. 


Though  optimal,  Huffman  encoding  does  not  quite  reach  Shannon’s  predicted 
entropy  limit.  Consider  the  breakdown  of  our  imaginary  100-character  message  and  its 
associated  Huffman  tree  shown  below: 


Symbol 

Frequency 

Entropy 

(bits) 

Total  Entropy 
(bits) 

Huffman 
Code  Length 
(bits) 

Total  Huffinan 
Code  Length 
’  (bits) 

A 

45 

1.15 

51.75 

1 

45 

E 

16 

2.64 

42.24 

3 

48 

H 

13 

2.94 

38.22 

3 

39 

R 

12 

3.06 

36.72 

3 

36 

G 

9 

3.47 

31.23 

4 

36 

F 

5. 

4.32 

4 

20 

100 

221.76 

224 

Table  2:  Frequency  Table  for  100-< 

Character  Message 

12 


0 


1 


Figures:  Huffman  Tree 

We  see  that  while  the  total  information  content  of  the  message  is  221.76  bits, 
Huffman  requires  224  bits  to  actually  encode  it.  While  this  is  not  bad,  it  is  not  truly 
optimal.  The  deviation  from  the  predicted  value  is  due  to  the  inability  of  the  method  to 
accurately  represent  the  fractional  entropy  of  each  symbol.  Thus,  while  there  is  no  better 
integral  length  encoding  scheme,  there  is  still  room  to  improve  on  Huffman’s  algorithm. 
Note  that  Shannon-Fano  encoding  will  also  require  224  bits  for  this  file. 
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4. 


Arithmetic 


Arithmetic  coding  [12,  13,  14]  can  use  either  a  static  or  an  adaptive  statistical 
model.  It  completely  discards  Huffman’s  idea  of  replacing  an  input  symbol  with  a 
specific  code,  and  instead  replaces  the  entire  string  of  the  input  symbols  in  a  message 
with  a  single  floating  point  number  between  0  and  1 .  For  this  reason  it  does  not  suffer 
the  discrete  code-length  limitations  of  Huffman  encoding.  Arithmetic  encoding  uses  a 
probability  distribution  to  assign  a  proportionate  range  between  0  and  1  to  each  input 
symbol.  It  is  typically  able  to  approach  the  Shannon  theoretical  minimum  -  sometimes 
yielding  improvements  of  up  to  10%  over  standard  Huffman  encoding.  The  major 
drawback  of  the  technique  is  that  it  is  computationally  intensive  and  therefore  slow. 

Arithmetic  Encoding  Algorithm: 

1 .  Build  a  probability  table  for  all  the  input  symbols. 

2.  Establish  proportionate  lower  boimds  (Li)  and  upper  bounds  (Ui)  between  0 

and  1  for  each  symbol  based  upon  its  probability  of  occurrence. 

3.  low  =  0.0,  high  =  1 .0,  range  =1.0 

4.  Read  in  the  next  symbol  to  encode  Si. 

5.  range  =  high  -  low. 

6.  high  =  low  +  range  *  Ui. 

7.  low  =  low  +  range  *  Li. 

8.  While  more  symbols  remain  to  encode,  repeat  step  4. 

9.  Output  the  floating  point  number  between  low  and  high  that  has  the  shortest 

binary  representation. 
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Consider  a  hypothetical  message  written  in  a  3-symbol  alphabet  D,  A,  X  with  the 
following  probability  distribution  and  bounds.  Notice  that  each  character  is  assigned  the 
portion  of  the  0  to  1  range  that  corresponds  to  its  probability  of  appearance. 


Symbol  Index 

(i) 

Symbol 

(Si) 

Probability 

(Pi) 

Lower  Bound 
(Li) 

Upper  Bound 
(Ui) 

1 

D 

0.20 

0 

0.20 

2 

A 

0.38 

3 

X 

0.62 

0.38 

1 

Table  3:  Probability  Table  for  Arithmetic  Encoding 

To  see  how  the  method  works,  we  encode  the  message  “DXXA”  one  symbol  at  a 
time.  The  first  s>-mbol  “D”  has  a  range  from  0  to  0.20  so  we  know  that  the  final  encoded 
value  for  the  message  must  fall  within  this  range.  Each  time  we  add  a  new  symbol  we 
further  restrict  the  range  of  our  final  value  that  represents  the  entire  message.  Thus,  the 
next  symbol  “X”  restricts  us  to  the  last  62%  of  the  current  range,  or  0.124  to  0.2.  The 
next  symbol  is  another  “X”  so  we  again  restrict  the  output  to  the  last  62%  of  the  current 
range,  or  0.171 12  to  0  2.  The  final  symbol  “A”  limits  us  to  between  20%  and  38%  of  the 
current  range,  or  0. 1 7()S‘>6  to  0.1820944.  Now  we  pick  the  value  in  the  final  range  with 
the  shortest  binary  representation.  This  is  accomplished  with  some  relatively 
straightforward  binary  anthmetic.  In  this  case  the  shortest  value  turns  out  to  be 
0.1796875  =  0.00101 1 1  base  2.  The  message  is  encoded  as  a  unique  floating-point  value 
between  0  and  1 .  We  require  7  bits  to  compres§  4  symbols  (about  the  same  performance 
as  Huffinan  on  this  particular  example). 

Consider  the  resulting  partition  of  the  number  line  shown  below  to  see  how  any 
string  in  our  alphabet  could  be  similarly,  yet  uniquely,  encoded.  Notice  that  there  is 

exactly  one  path  to  any  given  “leaf’  of  our  fuzzy  ternary  free. 
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0 


1 


jDA.V  pA.\|DA\  DAN  DAX  DAN  pAX  DAV  pA\  DAS  DAN  DAV  pAV  DAV  DAN  DAX  b  V\  DAX  pAX  pAX  DAX  pAX  pAX  pAX  pAX  DAX  DAX 


Table  4:  Partitioning  of  the  Number  Line  by  Arithmetic  Encoding: 
A  Fuzzy  n-ary  Tree  (intervals  not  to  scale) 


5.  Ziv  and  Lempel 

Jacob  Ziv  and  Abraham  Lempel  are  essentially  the  fathers  of  adaptive  dictionary 
compression.  Their  first  algorithm,  commonly  referred  to  as  LZ77  [14],  uses  previously- 
seen  text  as  a  dictionary.  The  algorithm  replaces  variable-length  phrases  (symbols)  from 
its  input  stream  with  fixed-length  indices  (codes)  into  its  dictionary.  What  makes  this  an 
adaptive  method  is  that  its  dictionary  is  literally  a  sliding  window  consisting  (in  a  typical 
implementation)  of  the  last  4k  bytes  of  data  from  the  input  stream.  Thus,  while  new 
groups  of  symbols  are  being  read  in  to  a  look-ahead  buffer,  the  algorithm  is  searching  for 
matches  between  the  look-ahead  buffer  and  all  strings  located  in  the  previous  4k  bytes  of 
data  (the  dictionary).  If  a  match  is  found  then  an  ordered  triple  (a,  b,  c)  is  sent  to  the 
output  file.  The  triple  consists  of: 

•  An  index  a  into  the  previous  4k  of  data. 

•  The  length  b  of  the  match. 

•  The  symbol  c  immediately  following  the  match  in  the  look-ahead  buffer. 
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If  no  match  is  found  then  a  and  b  are  0  and  c  is  the  first  symbol  in  the  look-ahead 
buffer.  The  text  window  and  look-ahead  buffer  then  slide  forward  sufficiently  to  shift  the 
matching  symbol(s)  out  of  the  look-ahead  buffer  and  into  the  dictionary.  The  process 
then  repeats. 

The  amount  of  compression  achieved  by  this  method  depends  upon  the  dictionary 
entries  being  sufficiently  longer  then  the  ordered  triples  they  are  replaced  with.  Here  is  a 
simplified  example  of  the  process: 


Input  Stream 

^^T..upon  a  time  there  was  a  wise  man  who  stumbled 

upon  a  timeless  secret. 

Sliding  Dictionary 

Look-Ahead  Buffer 

Figure  4:  LZ77  Algorithm 


Suppose  our  sliding  dictionary  is  64  bytes  long  and  the  look-ahead  buffer  is  16 
bytes  long.  Further  suppose  that  the  first  occurrence  of  the  word  “upon”  begins  at 
position  35  in  the  current  sliding  dictionary.  Since  a  match  has  been  found,  our 
compression  program  will  now  send  the  ordered  triple  “35”,  “H”,  “1”  to  the  output  file. 
Since  the  dictionary  is  64  bytes  long  6  bits  are  required  to  represent  each  of  the  index  and 
length  of  the  match,  plus  8  bits  for  the  ASCII  character  “1”.  Thus,  we  have  just  encoded 
an  1 1  *  8  =  88  bit  phrase  in  6  +  6  +  8  =  20  bits. 

There  are  two  problems  with  LZ77.  First,  since  the  dictionary  is  constantly 
sliding,  the  algorithm  sometimes  actually  throws  away  valuable  old  strings  for  worthless 
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new  ones.  This  is  a  potential  drawback  to  adaptive  methods  in  general.  Second,  the 
length  of  pattern  matches  is  limited  to  the  length  of  the  look-ahead  buffer.  Increasing  the 
size  of  the  look-ahead  buffer,  however,  has  a  nasty  side  effect  -  it  proportionately 
increases  the  number  of  string  comparisons  that  must  be  performed  against  the 
dictionary.  This  creates  a  significant  performance  problem.  Thus,  LZ77  is  not  able  to 
store  long  strings  to  its  dictionary  even  when  it  would  be  highly  profitable  to  do  so. 

A  second  adaptive  dictionary  technique  developed  by  Ziv  and  Lempel,  referred  to 
as  LZ78  [16],  uses  a  different  approach  to  building  its  dictionary.  Instead  of  maintaining 
a  sliding  window  into  the  preceding  text,  LZ78  incrementally  builds  its  dictionary  from 
strings  found  in  all  of  the  preceding  text.  The  algorithm  does  this  by  growing  dictionary 
entries  one  character  at  a  time.  For  example,  the  first  time  the  string  “hello”  is 
encountered  it  is  not  put  in  the  dictionary.  Instead,  “h”  is  added  to  the  dictionary.  The 
second  time  “hello”  is  encountered  it  is  still  not  put  in  the  dictionary.  Instead,  “he”  is 
added.  The  third  occurrence  puts  “hel”  into  the  dictionary,  and  so  on  until  the  fifth 
occurrence  of  the  stnng  “hello”  actually  causes  “hello”  to  become  part  of  the  dictionary. 
This  incremental  procedure  of  slowly  adding  a  string  to  the  dictionary  does  a  good  job  of 
eventually  getting  all  of  the  frequently  used  strings  into  the  dictionary.  Once  the 
dictionary  begins  to  accurateh'  model  the  source  file,  large  savings  can  be  achieved  as  the 
algorithm  replaces  long  frequently  occurring  strings  with  short  fixed-length  indices. 

LZ78  does  not  suffer  from  the  same  limitations  as  LZ77.  It  is  able  to  handle  long 

strings  and  once  it  adds  a  string  to  its  dictionary  it  remains  there  until  the  entire  file  is 

processed.  LZ78  has  its  own  difficulties,  however.  In  LZ77  the  dictionary  is  easier  to 

manage.  It  is  a  fixed  block  of  already  processed  data.  In  LZ78  the  dictionary  starts  with 
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nothing  but  the  null  string  and  grows.  It  must  be  managed  with  a  data  structure, 
(typically  an  n-ary  tree  that  looks  remarkably  like  the  fuzzy  tree  shown  earlier  for 
arithmetic  encoding)  and  limited  to  some  finite  size  (typically  around  2'^  entries). 
Despite  these  problems  LZ78,  like  LZ77,  was  a  groundbreaking  compression  approach. 
Terry  Welch  later  enhanced  LZ78  -  the  resulting  algorithm  is  commonly  referred  to  as 
LZW. 
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in.  MATHEMATICAL  FOUNDATIONS 


A.  INTRODUCTION 

In  this  chapter  we  will  discuss  the  necklace  algorithm  and  the  properties  of  the 
necklace  algorithm  that  we  find  to  be  useful  for  compression.  First  we  explore  properties 
of  binary  n-tuple  necklaces.  Next,  we  consider  equivalence  classes  of  necklaces  and  how 
to  enumerate  the  equivalence  classes  for  a  given  n  using  the  Burnside  formula  [17].  Then 
we  use  the  Theta  Algorithm  [3]  to  generate  all  of  the  equivalent  necklaces  for  a  given  n. 
Next,  we  consider  the  space  reduction  that  can  be  realized  from  the  sub-cyclic  properties 
of  necklaces.  Finally,  we  describe  a  close  relative  to  the  necklace  classes,  the  Lyndon 
Words. 

B.  NECKLACE  CLASSES 

The  binary  n-tuples  are  strings  of  O’s  and  I’s  of  a  fixed  length  n.  The  circular 
rotation  of  a  binary  n-tuple  produces  a  cyclic  equivalence  class  of  binary  n-tuples  called  a 
necklace  class.  We  represent  each  necklace  class  by  the  munerically  largest  n-tuple  in  the 
class.  This  gives  us  Z[n]  necklaces  representing  the  2"  strings  from  000... 0  to  111...1, 
where  Z[n]  is  defined  in  equation  4.  The  necklace  algorithm  provides  an  efficient  way  to 
list  these  necklace  classes.  The  necklace  algorithm  typically  produces  the  necklaces  in  the 
order  from  largest  to  smallest.  There  is  no  difference  in  which  way  we  represent  these 
strings  and  an  equivalent  version  of  the  necklace  algorithm  could  invert  the  order  of  the 
necklace  classes. 
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].  Equivalence  Classes  and  the  Burnside  Formula 


A  necklace  is  an  equivalence  class  of  strings  of  bits  of  length  n  that  can  be  rotated 
to  obtain  equivalent  aspects  of  the  same  necklace.  Thus,  two  necklaces  are  said  to  be 
inequivalent  if,  no  matter  how  rotated,  one  cannot  be  transformed  into  the  other.  Z[n] 
gives  the  number  of  inequivalent  necklaces  of  length  n,  where  Z[n]  is  determined  by 
using  an  instance  of  the  Burnside  enumeration  formula: 

Z[n]  =  (l/n)Z((t)(d)*2^'’^^\  for  d  =  1  to  n  and  d  is  a  divisor  of  n. 

Equation  4 

Here  (t)(d)  is  Euler’s  Totient  Function,  defined  as  the  number  of  positive  integers  less  than 
d  and  relatively  prime  to  d. 

The  formula  in  equation  4  enumerates  the  necklace  classes  for  a  given  n,  and  the 
necklace  algorithm  tells  us  what  the  necklaces  are.  To  illustrate  the  use  of  the  formula 
we  consider  an  example: 

Let  n  =  6,  then 

Z[n]  =  1/6(1  *2^  +  1*2^  +  2*2^  +  2*2)  =  1/6  (64+8+8+4)  =  84/6  =  14. 

d=l  d=2  d=3  d=6 

When  d  =  1 ,  only  1  is  relatively  prime  to  1 .  When  d  =  2,  again  only  1  is  relatively 
prime  to  2.  When  d  =  3,  1  and  2  are  relatively  prime  to  3.  When  d  =  6,  both  1  and  5  are 
relatively  prime  to  6. 

We  identify  two  upper  bounds  for  the  number  of  necklace  classes  of  length  n.  A 
simple  upper  bound  is  (2^"^*  A  tighter  upper  bound  is  When  n  =  6,  we 
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know  that  there  are  14  necklace  classes.  If  we  compute  the  first  upper  bound  we  get 
21.33  while  the  second  upper  bound  yields  16.  We  use  the  tighter  upper  bound  in  the 
implementation  of  our  compression  algorithms. 

2.  Necklace  Algorithm 

The  necklace  algorithm  is  fairly  straightforward.  First  we  need  to  define  the  0 
step.  This  operation  0:  V"  V"  is  defined  as  follows:  0(ai,a2, ...,  an)  = 

(bi,  b2 ,. . .,  bn)  where  bi  =  ai  for  i  =1 , 2, . . .,  j-1  and  j  is  defined  as  the  largest  subscript  such 
that  aj  >  0  and  ak  =  0  for  all  k  >  j .  Then 

bj  =  aj.i,bj  +  ,  =  bt  for t=  1,2,  ...,n-j. 

Necklace  Algorithm  (for  necklaces  of  length  n)  [3] 

0.  The  initial  necklace  is  1 1 ...  1  =  1 

1 .  To  find  the  i+1®'  necklace  apply  0  to  the  i'*’  necklace; 

2.  The  resulting  string  is  the  next  necklace  if  and  only  if  j|n. 

3.  If  j  does  not  divide  n,  then  apply  0^,  0^,  ...,  0'^  to  the  i*’’  necklace  until  the 
smallest  k  is  found  so  that  j|n.  The  resulting  string  is  the  next  necklace. 

4.  If  the  necklace  found  is  not  the  last  necklace,  namely,  0",  return  to  step  1. 

Note:  If  step  2  is  modified  such  that  j  =  n,  we  produce  a  list  of  all  Lyndon  words 
of  length  n  (Lyndon  words  are  discussed  in  more  detail  in  paragraph  B). 

To  illustrate  and  to  clarify  the  algorithm,  start  with  the  largest  n-long  necklace  (an 

n-long  string  of  1  ’s),  subtract  one  from  the  last  1  in  the  string  and  copy  the  string  ai  . . .  aj- 

1  imtil  an  n-long  string  is  formed.  Now  do  the  “j  divides  n  check”.  If  j|n  then  the  string 
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formed  is  a  necklace,  if  not,  subtract  one  again  from  the  last  non-zero  in  the  current  string 
and  copy  to  form  an  n-long  string. 

As  an  example  consider  the  necklaces  produced  when  n  =  6: 


mill 

This  is  the  first  necklace  (ai,  z.2,  ,  ae)- 

111110 

j  =  6 

Since  6  divides  6,  this  is  a  necklace. 

111101 

j  =  5 

Since  5  does  not  divide  6,  this  is  not 

111100 

j  =  6 

Since  6  divides  6,  this  is  a  necklace. 

111011 

j  =  4 

Since  4  does  not  divide  6,  this  is  not 

111010 

j  =  6 

Since  6  divides  6,  this  is  a  necklace. 

111001 

j  =  5 

Since  5  does  not  divide  6,  this  is  not 

111000 

j  =  6 

Since  6  divides  6,  this  is  a  necklace. 

110110 

j  =  3 

Since  3  divides  6,  this  is  a  necklace. 

110101 

j  =  5 

Since  5  does  not  divide  6,  this  is  not 

110100 

j  =  6 

Since  6  divides  6,  this  is  a  necklace. 

noon 

j  =  4 

Since  4  does  not  divide  6,  this  is  not 

110010 

j  =  6 

Since  6  divides  6,  this  is  a  necklace. 

110001 

j  =  5 

Since  5  does  not  divide  6,  this  is  not 

110000 

j  =  6 

Since  6  divides  6,  this  is  a  necklace. 

101010 

j  =  2 

Since  2  divides  6,  this  is  a  necklace. 

101001 

j  =  5 

Since  5  does  not  divide  6,  this  is  not 

101000 

j  =  6 

Since  6  divides  6,  this  is  a  necklace. 

100100 

j  =  3 

Since  3  divides  6,  this  is  a  necklace. 

100010 

j  =  4 

Since  4  does  not  divide  6,  this  is  not 

100001 

j  =  5 

Since  5  does  not  divide  6,  this  is  not 

100000 

j  =  6 

Since  6  divides  6,  this  is  a  necklace. 

000000 

j-1 

Since  1  divides  6,  this  is  a  necklace. 

a  necklace. 

a  necklace. 

a  necklace. 

a  necklace. 

a  necklace. 

a  necklace. 

a  necklace. 

a  necklace, 
a  necklace. 


There  are  fourteen  necklaces  generated  by  the  Necklace  Algorithm;  that  is  the 


number  given  by  the  equation  4.  Additionally,  it  is  easy  to  see  that  a  string  that  ends  with 
a  1  (except  all  1  ’s,  of  course)  can  never  be  a  necklace,  since  that  1  could  be  rotated  to  the 
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front  of  the  n-tuple  and  the  rotated  n-tuple,  being  larger,  will  have  already  appeared  on  a 
previous  necklace  in  the  lexicographic  listing  of  the  necklaces.  For  each  value  of  j,  the 
string  ai,  a2,  aj-i  is  a  Lyndon  word  of  length  j.  (See  paragraph  C  for  a  discussion  of 
Lyndon  words.) 

3.  Necklace  Algorithm  Pseudo  Code 


DESCRIPTION:  Finds  all  the  necklace  classes  of  length  n.  Note  the  indexing 
scheme  used  by  the  bit  arrays  of  this  method.  The  most  significant  bit  is 
considered  to  be  at  index  1  while  the  least  significant  bit  is  at  index  n. 

PRECONDITION:  The  array  classes  are  large  enough  to  hold  all  the  necklace 
classes  generated  by  the  algorithm. 

POSTCONDITION:  classes  contains  all  the  necklace  classes  of  length  n  sorted  in 
descending  numerical  order. 

void  getNecklaceClasses(/*  IN  */  int  n, 

/*  OUT  */  BitArray[]  classes) 

{ 


BitArray  c  =  2^  -  1; 
classes  [0]  =  c; 
intk=  1; 
while  (c  >  0) 

{ 

int  j  =  c.leastSiglO; 

c[j++]  0 

inti=  1; 
while  (j  <=  n) 
cD++]  =  c[i-H-]; 

if  Gin) 

classes[k-H-]  =  c; 

} 


//  the  first  class  is  all  I’s 
//  store  it  at  index  0  of  classes 
//  index  to  store  next  class  at 


//  j  gets  index  of  least  sig  1  in  c 
//  replace  least  sig  1  in  c  with  0 

//  copy  over,  copy  over, . . . 

//  j-check,  if  TRUE  it’s  a  class 


Figure  5:  The  Necklace  Algorithm  Pseudo  Code 
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4. 


Sub-cyclic  Properties 


As  mentioned  previously,  the  necklace  algorithm  generates  all  of  the  necklace 
classes  for  a  given  n.  These  necklaces  are  listed  lexicographically  from  the  largest  (1")  to 
the  smallest  (0").  This  is  contrary  to  normal  order,  but  the  results  are  equivalent  to  an 
ordering  from  smallest  to  largest  and  it  is  easy  to  translate  between  the  necklaces  listed 
from  smallest  to  largest. 

One  of  the  most  interesting  properties  of  the  necklace  algorithm  is  that  the  sub- 
cyclic  rotations  also  occur  for  each  divisor  of  a  given  n.  That  is,  all  the  m-long  necklaces 
appear  where  m  is  a  factor  of  n.  The  factors  of  n  tell  us  the  length  of  the  sub-cyclic  or 
repeating  strings  for  the  given  n.  Using  the  Mobius  function  [17],  in  place  of  the  totient 
function,  one  may  enumerate  the  sub-cyclic  rotations  for  the  given  factors  of  n.  In  fact, 
regardless  of  the  value  of  n,  the  number  of  sub-cyclic  rotations  of  length  m  will  always  be 
the  same  for  the  factors  m  of  n.  In  other  words,  since  1  is  a  factor  of  every  n,  there  are 
two  sub-cyclic  rotations  of  length  1,  namely  0  and  1  and  the  necklace  (10)  appears  in 
every  necklace  class  when  n  is  even.  Thus,  when  mjn,  each  m-long  necklace  will  appear 
in  the  necklace  algorithm  for  the  parameter  n. 

In  the  example  given  where  n  =  6  and  has  the  factors  1,  2,  3  and  6,  the  necklaces 
that  appear  are  all  the  necklaces  for  m  =  1,  2,  3  and  6.  If  we  modify  the  enumeration 
formula  in  equation  4  by  substituting  for  0(d)  the  flmction  |J.(d),  where  p.(d)  (the  Mobius 
function)  is  defined  by 
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^(n)  =  (-1)'^  when  n  has  k  distinct  prime  factors; 

p(n)  =  0  when  n  has  a  square  or  higher  order  factor,  we  obtain  the 
expression  on  the  right  hand  side  of  equation  5. 

We  obtain  the  number  of  necklaces  of  length  n  by  applying  this  modified 
enumeration  formula  on  each  of  these  factors  m  of  n. 

N(m)  =  (l/m)S(p(d)*2^"^‘‘^. 

Equation  5 

So,  when  n  is  1.  there  are  2  necklaces;  when  n  is  2,  there  is  1  necklace.  We 
determine  the  number  of  necklaces  when  n  is  3  to  be  1/3(1  *2^  -1*2)  =  l/3(8-2)  =  6/3  =  2. 
These  are  the  necklaces  (110  and  100).  Finally,  we  determine  the  number  of  necklaces 
whennis6as  1  6(  1  *2' -1*2'-1*2^+1*2)  =  1/6(64-8-4+2)  =  54/6  —  9.  Therefore,  9  of  the 
14  necklace  classes  u  ill  be  of  length  6  as  can  be  determined  fi'om  our  listing  above  of  the 
necklace  algorithm  Thus  the  total  number  of  necklaces,  including  all  sub-cyclic 
necklaces,  for  n  =  6  is  (2*  1  -2-“9)  =  14. 

These  sub-c\clic  rotations  are  important  in  our  version  of  the  data  compression 
algorithms,  because,  it  is  cheaper  to  represent  a  string  with  fewer  rotations,  than  one  with 
greater  rotations.  Namely,  it  costs  only  1  bit  ,to  represent  a  necklace  with  sub-cyclic 
length  2,  where  it  costs  3  bits  to  represent  all  of  the  possible  rotations  of  a  necklace  of 
length  6. 
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c. 


LYNDON  WORDS  [18] 


Although  the  two  lossless  compression  techniques  discussed  in  this  paper  do  not 
use  Lyndon  words,  we  cover  them  because  of  their  relation  to  necklaces  and  because  we 
recommend  them  for  future  research. 

A  k-3xy  necklace  is  an  equivalence  class  of  k-zry  strings  under  rotation.  We  take 
the  lexicographically  smallest  such  string  as  the  representative  of  each  equivalence  class 
and  use  this  in  the  output  of  the  program.  A  Lyndon  word  is  an  aperiodic  necklace 
representative. 

The  illustration  below  shows  the  6  binary  necklaces  with  4  beads  and  the 
corresponding  equivalence  classes  of  strings.  The  three  Lyndon  words  are  0001,  0011, 
and  01 11. 


0  0  0  1 

0  011 

0010 

0110 

0100 

1100 

1000 

1001 

0101 

1010 


0111 

1011 

1101 

1110 


1111 


Figure  6:  Lyndon  words  of  length  four. 


Equation  6  gives  the  explicit  formula  for  the  number  of  Lyndon  words  of  length  n 
over  a  k-zry  alphabet. 


NK(n)  =  (l/n)i:(p(d/n)*k‘’) 


Equation  6 
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IV.  ROTATIONAL  TREE  APPROACH 


A.  INTRODUCTION 

In  this  chapter  we  exhibit  the  bulk  of  our  research  efforts.  Our  efforts  to  find  a 
compression  application  involving  the  binary  necklace  classes  discussed  in  section  III, 
led  us  to  The  Rotational  Tree  Approach  (RTA).  Our  motivation  for  exploring  the 
approach  arises  from  the  following  observation.  Let  A  be  the  set  of  all  bit  strings  of 
length  n.  Further,  define  B  to  be  the  set  of  all  necklace  classes  of  length  n.  Consider  the 
onto  mapping  firom  the  domain  A  to  the  codomain  B,  where  each  n-tuple  maps  to  its 
representative  necklace  class  in  B: 


Figure  7:  Mapping  of  Bit  Strings  to  Their  Necklace  Class 

This  mapping  is  attractive  fi-om  a  compression  standpoint  because  there  is  a 
significant  space  reduction  between  the  number  of  bit  strings  and  the  number  of  necklace 
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classes  to  which  they  map  since  there  are  2"  bit  strings  of  length  n  and  less  than  2^"  '  V  n 


necklace  classes. 


n 

Bit  Strings 

Classes 

Decrease 

6 

2^  =  64 

14 

78% 

12 

2‘^  =  4,096 

352 

91% 

16 

2*^  =  65,536 

4,116 

94% 

20 

2^°=  1,048,576 

52,488 

95% 

24 

2^'' =16,777,216 

699,252 

96% 

28 

2-^=268,435,456 

9,587,580 

96% 

32 

2^-  =  4.294,967,296 

134,219,796 

97% 

Table  5:  Space  Reduction  Afforded  by  Mapping  Strings  to  Necklace  Classes 

B.  EXPLANATION 

In  order  to  take  adv  antage  of  the  resulting  space  reduction,  our  idea  is  to  map  each 
source  symbol  in  A  to  its  corresponding  necklace  class  in  B,  and  then  represent  that  class 
with  an  index.  We  define  the  index  i  of  an  n-bit  class  as  the  position  of  the  class  in  the 
ordered  list  of  all  n-bit  classes  as  given  by  the  Necklace  Algorithm.  Thus,  by  mapping  a 
symbol  to  an  index  class  w  e  can  effectively  decrease  the  number  of  distinct  symbols  as 
well  as  the  size  of  each  such  symbol.  This  sounds  promising. 

The  difficulty  is  that  there  are  many  n-bit  strings  that  map  to  the  same  index  class. 
Thus,  in  order  to  reverse  the  mapping,  which  is  necessary  for  decompression,  additional 
information  is  needed;  specifically,  we  need  to  know  the  number  of  rotations  r  originally 
used  to  transform  the  bit  string  to  its  necklace  class. 
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What  we  have  at  this  point  is  a  vague  model  involving  the  quantities  r  (rotation) 
and  i  (index).  What  we  need  is  a  clever  way  to  use  them.  As  r  and  i  are  fixed  in  length 
(with  respect  to  a  given  n)  static  Huffman  encoding  seems  a  good  choice. 

Our  initial  efforts  revolved  around  a  two-Huffinan-tree  approach.  We  proceed  as 
follows:  We  first  read  n-bit  symbols  from  a  source  file.  We  then  break  each  symbol  into 
an  index  i  and  a  rotation  r.  We  build  one  Huffinan  tree  using  all  the  r’s  and  another  using 
all  the  i's.  Finally,  we  write  the  Huffman  codes  for  each  r,  i  pair  to  the  output  file,  in  the 
order  in  which  they  were  first  discovered. 

Surprisingly  (for  us  at  least)  this  approach  is  a  complete  failure.  It  typically 
achieves  very  low  levels  of  compression  (often  even  bloating  the  source  file).  After  some 
analysis  involving  comparisons  against  a  standard  single  Huffinan  tree  built  with  n-bit 
symbols,  we  draw  the  following  conclusions  concerning  this  failed  two-tree  approach: 

•  The  i-Huffinan  tree  is  much  narrower  and  shallower  than  a  standard  Huffman 
tree  built  with  n-bit  symbols.  This  means  the  codes  produced  by  the  i- 
Huffinan  tree  are  shorter.  This  is  exactly  the  type  of  decrease  in  symbol 
number  and  size  that  we  are  looking  for. 

•  The  amount  of  statistical  data  (output  file  header)  required  for  the  two-tree 
approach  is  significantly  less  than  that  required  for  standard  Huffman  (more 
on  this  later). 

•  The  added  expense  of  the  r-codes  paired  with  every  i-code  completely  erases 
the  gains  cited  above. 
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Essentially,  the  r-codes  are  an  added  expense  that  the  model  cannot  afford.  What 
we  need  is  a  way  to  make  the  r-codes  either  smaller  or  else  somehow  more  useful  to  the 
model.  Our  solution  consisted  of  developing  an  approach  involving  multiple  Huffinan 
trees.  It  proceeds  as  follows;  We  first  read  n-bit  symbols  from  a  source  file.  Then  we 
break  each  symbol  into  a  rotation  r  and  an  i.  We  put  the  r’s  into  one  Huffinan  tree,  and 
distribute  the  i's  to  a  forest  of  Huffinan  trees  based  upon  their  associated  r’s.  Finally,  we 
write  the  Huffinan  codes  for  each  r,  i  pair  to  the  output  file,  in  the  same  order  in  which 
they  were  discovered. 

This  approach  puts  the  r-codes  to  work  by  using  them  to  distribute  the  i-codes  to 
not  one,  but  many  Huffinan  trees.  This  is  advantageous,  as  each  tree  in  the  forest  will  be 
both  narrower  and  shallower  than  the  original  i-tree,  and  as  such  will  produce  shorter 
codes.  The  next  section  refines  this  procedure. 

C.  ALGORITHM 

1.  Build  a  static  statistical  model  of  the  data  as  follows: 

a.  Construct  n  +  1  empty  frequency  tables  (indexed  from  0  to  n)  where  n 
represents  the  length  in  bits  of  the  symbols  read  fi-om  the  input  stream. 

b.  Read  the  next  n-bit  symbol  S  from  the  input  stream. 

c.  Determine  the  number  r  of  rotations  required  to  transform  S  into  its 
necklace  class  representative  c. 

d.  Determine  the  index  i  of  c  in  an  ordered  list  of  all  n-bit  necklace  classes. 

e.  If  i  does  not  exist  in  fi'equency  table  r,  put  i  into  frequency  table  r  with  a 
frequency  count  of  1.  Else,  increment  the  frequency  count  of  i  in 
fi'equency  table  r. 
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f.  If  r  does  not  exist  in  frequency  table  n,  put  r  into  frequency  table  n  with  a 
frequency  count  of  1.  Else,  increment  the  frequency  count  of  r  in 
frequency  table  n. 

g.  While  there  are  more  symbols  in  the  input  stream,  repeat  step  a. 

2.  Create  a  Huffinan  tree  from  each  of  the  n  +  1  frequency  tables.  The  leaves  of 
Huffman  tree  n  represent  rotations.  The  leaves  of  all  other  Huffinan  trees 
represent  class  indices. 

3.  Reset  the  input  pointer  to  the  beginning  of  the  input  stream. 

4.  Read  the  next  n-bit  symbol  S  from  the  input  stream. 

5.  Determine  r  and  i  for  S. 

6.  Write  the  Huffinan  code  for  r  in  Huffman  tree  n  to  the  output  stream. 

7.  Write  the  Huffinan  code  for  i  in  Huffman  tree  r  to  the  output  stream. 

8.  While  there  are  more  symbols  in  the  input  stream,  repeat  step  4. 

D.  EXAMPLE 


Consider  a  hypothetical  54-character  message  and  its  resultant  breakdown: 
Message:  acgCs7bnacgCs7bnacgCs7bnggCCCCssss77777bnbnbnbnbnbnbnn 


Symbol 

Binary 

Representation 

Necklace  Class 

r 

i 

Frequency 

a 

01100001 

11000010 

1 

26 

3 

c 

01100011 

11011000 

6 

18 

3 

g  . 

01100111 

11101100 

5 

10 

5 

c 

01000011 

11010000 

6 

21 

7 

S 

01110011 

11100110 

1 

7 

7 

00110111 

11100110 

5 

13 

8 

b 

01100010 

11000100 

1 

10 

n 

01101110 

11100110 

4 

13 

11 

54 

Table  6:  Example  Message  Breakdown  by  RTA  (n  =  8) 
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Step  1  of  the  RTA  algorithm  constructs  the  following  frequency  tables: 


i 

Frequency 

26 

3 

13 

7 

25 

10 

m 

o 

3 

21 

7 

Freq  Table  1  Freq  Table  6 


i 

Frequency 

13 

11 

Freq  Table  4 


r 

Frequency 

4 

1 

5 

2 

Q 

2 

_L 

3 

Freq  Table  n  =  8 


i 

Frequency 

10 

5 

13 

8 

Freq  Table  5 


Figure  8:  RTA  Step  1:  Build  Frequency  Tables 
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Step  2  of  the  RTA  algorithm  constructs  a  Huffinan  tree  for  each  frequency  table. 


Figure  9;  RTA  Step  2:  Build  Huffinan  Trees 
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With  the  Huffman  trees  constructed,  steps  3  thru  8  of  the  RTA  algorithm  begin 
the  process  of  creating  the  output  codes.  This  is  accomplished  by  generating  the  r,  i  pairs 
to  replace  each  input  symbol  in  the  message.  Thus,  the  stream  of  input  symbols 
acgCs7bna. . . 

is  transformed  into  the  stream  of  r,  i  pairs: 

I, 26  6,18  5,10  6,21  1,13  5,13  1,25  4,13  1,26... 

which  then  becomes  the  output  stream: 

II, 10  10,0  01,0  10,1  11,11  01,1  11,0  00,0  11,10... 

by  replacing  each  r  with  its  Huffman  code  from  tree  n  =  8  and  each  i  with  its  Huffinan 
code  from  tree  r.  For  example,  the  r,  i  pair  that  corresponds  to  symbol  “a”,  (found  on 
table  6  above)  is  1 ,  26.  The  code  that  represents  r  from  the  n=8  tree  (figure  9  above)  is 
1 1 .  The  code  that  represents  the  i  component  from  Huffman  tree  1  (figure  8  above)  is  10. 
Therefore  the  input  symbol  “a”  becomes  the  output  1110.  Now  move  to  the  next  symbol 
in  the  stream  and  repeat  the  process. 

F.  INTERPRETATION  OF  RESULTS 

We  successfully  implemented  RTA  in  the  Java  programming  language  (see 
Appendix  A)  and  collected  empirical  data  (see  Appendix  B).  The  overall  compression 
performance  of  RTA  is  roughly  equivalent  to  that  of  standard  Huffinan  encoding  on  both 
text  and  uncompressed  bitmapped  image  data.  Computationally,  RTA  is  more  expensive 
then  Huffman  encoding,  but  not  prohibitively  so.  However,  our  implementation  of  RTA 
is  space-hungry  -  requiring  up  to  5*2"  +  (4*2^"  '  V  bytes  just  to  build  the  tables 
needed  to  efficiently  calculate  r  and  i  for  all  the  input  symbols.  This  calculation  does  not 
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include  the  memory  needed  to  build  the  actual  Huffman  trees,  which  is  data  dependent 
but  can  also  be  very  significant.  The  compression  results  for  RTA  are  in  fact  cut  off  at 
n  =  24  due  to  memory  constraints  -  a  typical  0.5  MB  file  requiring  86  MB  of  table 
building  memory  alone! 

The  compression  performance  of  RTA  is  most  easily  explained  by  once  again 
considering  the  failed  two-tree  approach  discussed  in  part  B  above.  By  comparison  with 
standard  Huffman  encoding,  the  two-tree  approach  has  a  favorable  header  size,  and  a 
favorable  i-tree  width  and  depth.  Its  fatal  flaw  is  the  overhead  of  the  r-codes  paired  with 
every  i-code.  RTA  is  strapped  by  the  same  r-code  overhead,  but  unlike  the  two-tree 
method,  RTA  uses  its  r-codes  advantageously.  By  building  a  forest  of  i-trees  instead  of  a 
single  i-tree,  RTA  is  able  to  generate  much  shorter  i-codes.  This  allows  the  RTA  forest 
to  approach  the  performance  of  a  single  optimal  Huffman  tree  -  even  with  the  added 
overhead  of  the  r-codes.  A  useful  way  to  view  the  Huffinan  forest  created  by  RTA  is  as  a 
single  distributed  pseudo-Huffman  tree  with  the  r-tree  as  the  root,  and  the  i-trees  as 
leaves. 
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Figure  10;  Distributed  Pseudo-Huffman  Tree  Created  by  RTA 


It  is  easy  to  see  that  this  particular  distributed  tree  is  not  a  true  Huffman  tree  and 
is  therefore  not  optimal.  Note  that  node  i=10  from  i-tree  5  is  higher  in  the  distributed  tree 
than  node  i=13  from  i-tree  1,  even  though  node  i=13  has  a  greater  frequency. 

After  studying  the  tree  breakdovms  of  our  empirical  data  (see  Appendix  D  for  an 
example)  we  conclude  that  the  weight  of  the  distributed  tree  created  by  RTA  is  typically 
within  5%  of  the  weight  of  an  optimal  Huffman  tree.  While  it  is  theoretically  possible  for 
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the  distributed  tree  to  be  a  true  Huffinan  tree,  it  would  be  an  extremely  rare  occurrence 
when  compressing  real  data. 

Using  this  insight  into  the  distributed  tree  of  RTA,  we  conclude  that  the  sum  of 
the  lengths  of  the  i-codes  and  r-codes  produced  by  RTA  will  approach  the  sum  of  the 
codes  produced  by  a  standard  Huffman  tree  built  from  the  same  data.  The  problem  is,  if 
RTA  can  only  approach  Huffman  encoding,  then  how  is  it  that  the  empirical  data  shows 
RTA  marginally  outperforming  Huffman  encoding  on  almost  half  of  the  files  in  the  test 
suite?  The  answer  lies  in  the  compression  header.  RTA  requires  less  overhead  than 
standard  Huffman  encoding  to  pass  the  statistical  model  from  the  compression  program 
to  the  decompression  program. 

Model  overhead  is  one  of  the  drawbacks  of  static  methods  like  Huffinan 
encoding,  the  two-trec  approach,  and  RTA.  However,  without  a  statistical  model  the 
decompression  program  has  no  way  of  uncompressing  the  codes  in  the  compressed  files 
back  into  the  s\TnboIs  of  the  source  file,  so  inclusion  of  the  compression  model  is 
necessary.  Our  implementation  of  Huffrnan,  two-tree,  and  RTA  all  pass  their  static 
models  to  the  decompression  program  in  the  same  way.  Each  includes  a  frequency 
ordered  list  of  all  the  lca\  cs  of  their  Huffman  tree(s)  in  their  respective  header.  The 
header  includes  other  information  as  well  (see  Appendix  E  for  details),  but  none  so 
voluminous  as  the  leaf  list. 

The  symbols  we  shall  encode  in  our  Huffinan  code  are  binary  n-tuples.  A  first 
pass  through  the  data  determines  their  probability  of  occurrence  in  the  text.  Armed  with 
these  probabilities  a  Huffinan  code  can  be  made  for  the  file  under  consideration  and  the 
leaves  in  the  Huffinan  tree  generated  are  the  n-tuples  appearing  in  the  file.  Then,  in  the 
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worst  case,  the  leaf  list  for  Huffman  encoding  will  contain  2"  entries  of  n  bits  each.  This 
means  the  leaf  list  for  a  standard  Huffrnan  tree  can  require  up  to  n  *  (2")  /  8  bytes  of 
space. 

In  order  to  gain  insight  into  the  amount  of  space  consumed  by  the  leaf  lists  of 
RTA,  we  again  turn  to  our  failed  two-tree  approach.  We  noted  earlier  that  two-tree  had  a 
significantly  smaller  header  than  standard  Huffman  encoding.  This  is  a  consequence  of 
the  space  reduction  afforded  by  using  necklace  class  indices  as  leaves  of  the  i-tree.  Even 
though  two-tree  must  put  two  leaf  lists  in  its  header,  one  for  the  r-tree  and  another  for  the 
i-tree,  we  see  that  the  two-tree  method  still  requires  much  less  overhead  than  standard 
Huffman.  The  length  and  number  of  the  leaves  in  the  leaf  lists  of  the  two-tree  method  are 
governed  by  the  properties  of  binary  necklace  classes  discussed  in  chapter  3.  The 
relevant  information  is  summarized  in  the  table  17. 


Leaf  Length 
(bits) 

Upper  Bound  on 
Number  of  Leaves 

r-tree 

r  log2n1 

n 

i-tree 

n  +  1  -  r  log2n1 

+  1  -  flog  nl) 

^  2 

Table?:  Leaf  List  Properties 

An  upper  bound  U  on  the  maximum  leaf  list  size  (in  bytes)  for  the  two-tree 
approach  is  found  by  summing  the  product  of  the  leaf  length  and  the  number  of  leaves  for 
each  leaf  list  then  dividing  by  8,  or 

U  =  (T logzn!  *  n  +  (n  +  1  -  r logznl)  ♦  2  ^ /  8. 

The  following  table  points  out  the  drastic  differences  possible  between  the  leaf 
lists  of  standard  Huffinan  and  those  of  the  two-tree  approach. 
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n 

4 

8 

12 

16 

20 

24 

Standard 

Huffinan 

8 

256 

4,096 

65,536 

1,048,576 

16,777,216 

Two-Tree 

Approach 

4 

51 

582 

13,320 

131,085 

2,621,455 

Table  8:  Worst  Case  Leaf  List  Comparisons 

When  run  on  typical  data,  the  actual  differences  between  the  leaf  lists  of  the  two 
approaches  is  never  as  dramatic  as  the  worst-case  scenario  shown  above.  If  it  had  been, 
two-tree  would  likely  have  been  a  viable  approach. 

We  now  draw  comparative  conclusions  between  the  leaf  lists  of  RTA  and  those  of 
two-tree.  The  r-trees  created  by  the  two  approaches  are  identical.  Therefore,  the  r-tree 
leaf  lists  are  identical.  The  difference  between  the  two  approaches  is  simply  the  number 
of  i-trees  created.  Two-tree  creates  one  long  i-tree  leaf  list.  RTA  creates  multiple  shorter 
i-tree  leaf  lists.  Since  both  approaches  process  the  same  r,  i  pairs,  it  is  tempting  to 
conclude  that  the  concatenation  of  the  smaller  leaf  lists  of  RTA  would  result  in  the  longer 
leaf  list  of  two-trees.  If  this  were  true,  the  overhead  of  the  two  approaches  would  be 
almost  identical.  An  examination  of  the  distributed  tree  depicted  in  table  16  above, 
however,  reveals  the  flaw  in  this  supposition.  There  are  overlaps  between  the  leaves  of 
several  of  the  i-trees.  Specifically,  i-trees  1,  4,  5  all  have  an  i  =  13  leaf  This 
demonstrates  that  in  RTA  some  of  the  leaves  may  occur  more  than  once  in  the  header. 
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Thus,  the  header  of  RTA  will  be  larger  than  that  of  two-tree  by  the  number  of 
overlapping  leaves. 

The  actual  number  of  overlapping  leaves  in  RTA  is  data  dependent.  If  the 
number  of  overlapping  leaves  is  low,  we  expect  a  header  size  similar  to  that  of  two-trees 
and  well  below  that  of  Huffman.  If  the  number  of  overlapping  leaves  in  RTA  is  high,  it 
is  possible  that  the  header  could  approach  or  even  exceed  that  of  standard  Huffman.  The 
actual  header  sizes  found  for  the  test  suite  files  using  RTA  and  standard  Huffman 
encoding  are  shown  below. 


File 

Selected  n 

8 

12 

16 

20 

Kennedy. xls 

253 

1552 

2899 

26869 

plabm12  txt 

98 

988 

1891 

12346 

icetIO  txt 

100 

1443 

2931 

16068 

asyouilk  txt 

86 

1056 

1817 

10164 

alice29  txt 

91 

1107 

1965 

10074 

.grammer  Isp 

93 

493 

652 

1501 

icp  htm! 

104 

1109 

2063 

-  6351 

(fields  c 

106 

821 

1150 

3104 

jxargs  1 

91 

601 

802 

1926 

|sum 

250 

2073 

4058 

10274 

ipttS 

179 

961 

3958 

11429 

Henna 

255 

4499 

58125 

358290 

Table  9:  Header  Size  for  RTA 


File 

Selected  n 

8 

12 

16 

20 

kennedy.xls 

276 

1883 

3342 

33171 

98 

1188 

2203 

icet10.txt 

99 

1786 

3456 

83 

1283 

2111 

alice29.txt 

90 

1348 

2289 

91 

576 

728 

100 

1357 

2409 

fields.c 

106 

984 

1311 

xargs.1 

87 

711 

904 

2260 

sum 

273 

2608 

4805 

12557 

ptt5 

177 

1145 

4670 

13971 

lenna 

278 

5768 

71159 

447245 

Table  10:  Header  Size  for  Standard  Huffman  Encoding 


Tables  9  and  10  demonstrate  that  the  RTA  header  is  in  fact  typically  smaller  than 
the  header  of  standard  Huffman  encoding  at  n  =  8.  Further,  it  is  apparent  that  as  n 
increases  the  gap  in  the  header  size  difference  increases  in  favor  of  RTA  over  Huffinan. 
This  fits  well  with  the  empirical  data  (see  Appendix  B)  that  shows  RTA  performing  best 
at  large  values  of  n’s.  Thus,  we  have  probably  identified  the  key  element  responsible  for 
the  (limited)  success  of  RTA  —  decreased  header  size.  This  is  no  surprise  as  headers  are 
the  likely  place  to  economize  over  the  provably  optimal  encoding 
G.  SUMMARY 

We  now  summarize  our  conclusions  fi’om  the  previous  section  regarding  RTA  vs. 
standard  Huffman  encoding. 

1 .  RTA  compresses  typical  text  data  to  within  ±0.5%  of  Huffman  encoding. 

2.  RTA  is  computationally  more  expensive  than  Huffman  encoding,  though 
not  prohibitively  so. 
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3.  RTA  is  space  (memory)  inefficient  to  the  point  of  being  infeasible. 

4.  The  non-header  portion  of  a  file  compressed  by  RTA  typically  comes 
within  5%  of  the  non-header  portion  of  the  same  file  compressed  by 
standard  Hufftnan  encoding. 

5.  RTA  has  a  more  compact  header  than  standard  Huffman  encoding.  The 
difference  between  the  headers  increases  as  n  increases.  Thus,  RTA 
performs  best  on  large  files  with  a  high  n. 
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V.  INDEXED  TREE  APPROACH 

A.  INTRODUCTION 

In  this  chapter,  we  attempt  to  improve  upon  RTA  by  causing  the  distributed  tree 
to  more  closely  approach  an  optimal  Huffman  tree.  We  also  seek  to  further  decrease  the 
size  of  the  header  and  to  reduce  the  memory  requirements  of  our  earlier  approach.' 

One  way  to  improve  the  distributed  tree  created  by  RTA  is  to  decrease  the  depth 
and  breadth  of  the  i-trees.  It  is  tempting  to  conclude  that  this  can  be  accomplished  simply 
by  increasing  the  total  number  of  i-trees.  Unfortunately,  increasing  the  number  of  i-trees 
requires  an  increase  in  the  length  of  each  r-code,  which  could  prove  detrimental  to  the 
overall  compression  ratio.  Thus,  we  consider  a  more  conservative  approach  to  improving 
the  distributed  tree  of  RTA. 


Index  Tree 


Chart  1:  Dispersion  of  ASCII  Characters  by  RTA  (n  =  8) 
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Table  21  summarizes  the  mapping  between  ASCII  keyboard  characters  and  the 
index  trees  of  RTA  (see  Appendix  B  for  details).  The  graph  provides  some  indication  of 
how  good  a  job  RTA  is  doing  in  dispersing  i  values  to  its  forest  of  i-trees.  It  seems 
reasonable  that  a  more  even  distribution  between  index  trees  would  benefit  the  distributed 
tree  of  RTA.  In  fact,  analysis  of  the  actual  index  trees  created  by  RTA  when  run  on  the 
test  suit  data  shows  that  as  n  increases  many  trees  are  often  left  empty.  This  suggests  that 
we  can  enhance  the  distributed  tree  of  RTA  by  somehow  doing  a  better  job  of  evenly 
dispersing  i  values  to  the  i-tree  forest. 

B.  EXPLANATION 

Our  Indexed  Tree  Approach  (IT A)  completely  discards  necklace  classes  and 
rotations  as  a  mechanism  to  create  a  Huffman  forest.  Instead,  as  we  read  an  input  symbol 
S  we  use  the  first  few  bits  of  S  (iBits)  as  a  straightforward  index  into  our  Huffman  forest. 
The  remaining  bits  of  S  GBits)  are  then  used  as  a  leaf  value  for  the  tree  indexed  by  iBits. 
We  make  the  following  clarifying  remarks: 

•  Sis  the  concatenation  of  iBits  with  jBits. 

•  |iBits|  +  IjBitsI  =  n  . 

•  The  integer  representation  of  the  unsigned  value  iBits  is  denoted  by  [iBits]. 
ITA  proceeds  as  follows:  We  first  read  n-bit  symbols  from  a  source.  We  then  divide 

each  symbol  into  its  respective  iBits  and  jBits  components.  We  put  the  iBits  into  one 
Huffman  tree,  and  distribute  the  jBits  into  a  forest  of  Huffinan  trees  based  upon  the  value 
of  their  associated  iBits  components.  Finally,  we  vmte  the  Huffman  codes  for  each  iBits, 
jBits  pair  to  the  output  file  in  the  order  they  were  originally  discovered. 
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Clearly,  ITA  is  a  derivative  of  RTA.  ITA  creates  a  similar  Huffinan  forest,  and 
constructs  similar  ordered  pairs  of  Huffinan  codes.  It  is,  however,  much  more  flexible 
than  RTA.  In  RTA,  a  given  value  n  determines  the  length  of  the  r-  and  i-  components  of 
the  algorithm.  In  ITA,  the  sum  of  |iBits|  and  ljBits|  determines  the  value  of  n.  Thus,  there 
is  only  one  way  for  RTA  to  compress  a  given  file  at  a  given  n,  while  there  are  many  ways 
for  ITA  to  compress  the  same  file  using  the  same  n.  For  example,  if  n  =  12,  ITA  can  be 
executed  with  the  following  (|iBitsl,  ljBits|)  pairs;  (1, 1 1),  (2, 10),  (3,  9),  (4,  8),  (5, 7), 

(6, 6),  (7,  5),  (8, 4),  (9,3),  (10,  2)  and  (1 1, 1). 

C.  ALGORITHM 

1 .  Build  a  static  statistical  model  of  the  data  as  follows: 

a.  Construct  n  +  1  empty  frequency  tables  (indexed  from  0  to  n)  where  n>l 
represents  the  length  in  bits  of  the  symbols  to  read  from  the  input  stream. 

b.  Read  the  next  n-bit  symbol  S  from  the  input  stream. 

c.  Divide  S  into  its  iBits  and  jBits  components. 

d.  If  iBits  does  not  exist  in  frequency  table  n,  put  iBits  into  frequency  table  n 
with  frequency  1 .  Else,  increment  the  frequency  count  of  iBits  in  table  n. 

e.  If  jBits  does  not  exist  in  frequency  table  [iBits],  put  JBits  into  frequency 

f 

table  [iBits]  with  frequency  1.  Else,  increment  the  frequency  coimt  of 
JBits  in  table  [iBits]. 

f.  While  more  symbols  remain  in  the  input  stream,  repeat  step  a. 
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2.  Create  a  Huffman  tree  from  each  of  the  n  +  1  frequency  tables.  The  leaves  of 
Huffman  tree  n  represent  iBits.  The  leaves  of  all  other  Huffman  trees 
represent  jBits. 

3.  Reset  the  input  pointer  to  the  beginning  of  the  input  stream. 

4.  Read  the  next  n-bit  symbol  S  from  the  input  stream. 

5.  Divide  S  into  its  iBits  and  jBits  components. 

6.  Write  the  Huffman  code  for  iBits  in  Huffman  tree  n  to  the  output  stream. 

7.  Write  the  Huffman  code  for  JBits  in  Huffman  tree  [iBits]  to  the  output  stream. 

8.  While  more  symbols  remain  in  the  input  stream,  repeat  step  4. 

D.  INTERPRETATION  OF  RESULTS 

We  successfully  implemented  ITA  in  Java  (see  Appendix  A)  and  collected 
empirical  data  (see  Appendix  B).  In  general,  the  compression  performance  of  ITA  is 
typically  superior  to  that  of  standard  Huffman  encoding  (and  RTA)  by  a  margin  of  one  to 
three  percent  on  text  data.  This  difference  is  due  to  further  decreases  in  the  compressed 
file  header  size.  ITA’s  performance  is  roughly  equivalent  to  that  of  Huffhaan  encoding 
on  uncompressed  bitmapped  images.  ITA  is  computationally  more  expensive  than 
Huffinan  encoding,  though  like  RTA,  the  difference  is  not  apparent  to  the  user.  ITA  does 
not  suffer  from  the  memory  problems  of  RTA.  In  fact,  it  generally  has  a  smaller  memory 
footprint  than  does  Huffman  encoding. 

Inspection  of  table  22  below  (extracted  from  our  empirical  results,  see  appendix 
B)  reveals  that  the  performance  gains  made  by  ITA  over  Huffman  and  RTA  are  primarily 
attributable  to  further  decreases  in  compressed  file  header  size. 
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Optimal 

Huffman 

RTA 

ITA 

Huffman 

RTA  Non- 

ITA  Non- 

File 

n 

(liBitsI, 

IjBitsI) 

Header 

Header 

Header 

Non-Header 

Header 

Header 

kennedv.xls 

8 

(4.4) 

276 

253 

191 

462532 

528207 

516085 

12 

(5,7) 

1883 

1552 

1324 

495537 

502365 

505657 

16 

(7.9) 

3342 

2899 

2476 

410583 

422265 

421657 

20 

(9.11) 

33171 

26869 

20596 

422785 

426101 

428105 

plabml2.txt 

8 

(4.4) 

98 

98 

88 

275585 

288973 

283622 

12 

(4.8) 

1188 

988 

910 

306646 

308099 

306952 

16 

(5  11) 

2203 

1891 

1630 

238776 

241024 

239739 

20 

(9.11) 

15159 

12346 

9652 

255407 

255701 

256850 

icetl0.txt 

8 

(4.4) 

99 

100 

85 

250565 

260360 

255465 

12 

(4.8) 

1786 

1443 

1324 

280698 

282929 

281309 

16 

(5.11) 

3456 

2931 

2499 

218529 

220143 

218889 

20 

(9.11) 

19763 

16068 

12453 

231209 

231767 

232403 

asvouIik.txt 

8 

(4.4) 

83 

86 

76 

75806 

78363 

77650 

12 

(5.7) 

1283 

1056 

907 

83237 

83700 

83528 

16 

(7.9) 

2111 

1817 

1469 

64531 

64969 

64891 

20 

(9.11) 

12444 

10164 

8151 

67550 

67644 

67834 

alice29.txt 

8 

(4.4) 

90 

91 

82 

87688 

91563 

90150 

12 

(4.8) 

1348 

1107 

1020 

98274 

98839 

98409 

16 

(5.11) 

2289 

1965 

1681 

76120 

77014 

76522 

20 

(9.11) 

12337 

10074 

8076 

80171 

80324 

80626 

grammer.lsp 

8 

(4,4) 

91 

93 

83 

2170 

2227 

2205 

12 

(5.7) 

576 

493 

466 

2289 

2319 

2284 

16 

(7.9) 

728 

652 

630 

1703 

1715 

1718 

20 

(6,14) 

1746 

1501 

1410 

1645 

1656 

1650 

cp.html 

8 

(4,4) 

100 

104 

89 

16199, 

16389 

16575 

12 

(6.6) 

1357 

1109 

927 

17268 

17422 

17432 

16 

(7,9) 

2409 

2063 

1652 

13337 

13389 

13419 

20 

(8.12) 

7715 

6351 

5390 

12909 

12974 

12967 

fields.c 

8 

(2,6) 

106 

106 

90 

7026 

7104 

7027 

12 

(5.7) 

984 

821 

701 

7411 

8391 

8290 

16 

(6,10) 

1311 

1150 

972 

5529 

5538 

5552 

20 

(7.13) 

3703 

3104 

2808 

5408 

5436 

5429 

xargs,  1 

8 

(2.6) 

87 

91 

81 

2602 

2662 

2645 

12 

(5.7) 

711 

601 

528 

2792 

2809 

2802 

16 

(7.9) 

904 

802 

697 

2113 

2127 

2122 

20 

(8,12) 

2260 

1926 

1780 

2000 

2006 

2009 

sum 

8 

(2.6) 

273 

250 

229 

25645 

26706 

26238 

12 

(5.7) 

2608 

2073 

1735 

24733 

25221 

24984 

16 

(7.9) 

4805 

4058 

3213 

19977 

20242 

20167 

20 

(8.12) 

12557 

10274 

8681 

19355 

19503 

19440 

Ptt5 

8 

(3.5) 

177 

*179 

160 

106551 

158044 

158279 

12 

(4.8) 

1145 

961 

866 

86993 

119953 

119939 

16 

(6,10) 

4670 

3958 

3227 

76523 

100685 

100568 

20 

(9.11) 

13971 

11429 

8911 

70124 

88956 

88886 

Table  12:  Header  and  Non-header  Compression  Results  (in  bytes) 
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In  order  to  explain  the  decrease  in  header  size  from  RTA  to  ITA  we  draw 
attention  to  the  “optimal  (|iBits|,  [jBits|)”  values  displayed  in  column  3  of  table  12.  In 
most  cases  |iBits|  and  [jBitsI  are  tightly  centered  on  n/2.  The  reason  for  this  goes  back  to 
the  leaf  lists  discussed  in  chapter  4.  The  length  and  number  of  leaves  in  the  leaf  lists  of  a 
single  iBits  and  jBits  tree  are  given  below. 


Leaf  Length 
(bits) 

Upper  Bound  on 
Number  of  Leafs 

iBits  tree 

liBits 

2|lBltS| 

JBits  tree 

[jBits 

~2lj"Bits| 

Table  13:  Leaf  Lists  in  a  Single  iBits  and  jBits  Tree 


An  upper  bound  on  the  max  leaf  list  size  in  b)des  for  a  single  iBits  and  JBits  tree  is 
found  by  summing  the  product  of  the  leaf  length  and  the  number  of  leaves  for  each  leaf 
list  then  dividing  by  8.  Table  14,  below,  shows  how  the  upper  bound  for  the  size  of  the 
combined  leaf  lists  of  a  single  iBits  and  JBits  tree  change  as  i  increases  and  J  decreases. 


n  =  10 

(|iBits|,  [jBitsj) 

(1,9) 

576 

(2,8) 

257 

(3,7) 

115 

(4,  6) 

56 

(5,5) 

40 

(6,4) 

56 

(7,  3) 

115 

(8,2) 

257 

(9,  1) 

576 

Table  14:  Sum  of  Leaf  List  Upper  Bounds  for  a  single  iBits  and  jBits  Tree 

Table  14  illustrates  how  the  (|iBits|,  ljBits|)  =  (n/2,  n/2)  is  a  minimum  of  it’s 
f(|iBits|,  [jBitsI)  =  (I  Bits  I  *  +  |  jBits  |  *  2iiBi«i)  /  8.  Thus  (jiBitsI,  ljBits|)  =  (n/2,  n/2) 


50 


produces  a  minimal  size  header  for  a  hypothetical  two-tree  IT  A.  Since  the  actual  ITA 
uses  many  jBits  trees  we  would  expect  some  repeated  leaves  in  the  header  (just  as  in 
RTA).  Thus,  while  it  is  possible  for  the  ITA  header  to  exceed  the  shown  values,  the  table 
suggests  optimal  |iBits[,  [jBitsI  pairs  that  fit  well  with  the  empirical  data. 

Naturally,  (|iBits|,  [jBits|)  =  (n/2,  n/2)  need  not  always  be  the  exact  optimal  value. 
Indeed,  this  decomposition  is  only  optimal  in  about  15%  of  our  test  cases.  The  most 
common  optimal  (|iBits|,  [jBitsl)  pairs  for  n  =  8,  12,  16,  and  20  are  (4,  4),  (5,  7),  (7,  9), 
and  (9,  11)  respectively.  Thus,  there  are  other  factors  at  work  besides  simply  header 
overhead.  The  most  critical  factor  in  overall  compression  results  for  ITA  is  the  degree  of 
similarity  betw'een  IT.A’s  distributed  tree  and  an  optimal  Huffman  tree  constructed  with 
the  same  data.  If  the  distributed  tree  performs  poorly,  any  gains  made  by  reduced  header 
overhead  are  quickK  lost 

As  mentioned  earlier.  ITA  is  more  memory  efficient  than  standard  Huffinan 
encoding.  The  reasons  for  this  are  similar  to  those  behind  the  smaller  header  -  it  is 
simply  cheaper  to  build  man>'  small  Huffinan  trees  than  it  is  to  build  a  single  large  one. 

Since  our  performance  gains  over  ITA  are  due  to  smaller  header  size,  we 
conclude  that  ITA  is  no  better  at  dispersing  jBits  than  RTA  was  at  dispersing  i-values. 
Thus,  although  we  are  able  to  enhance  the  overall  compression  performance  of  RTA,  we 
are  unable  to  improve  its  distributed  tree. 
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VI.  CONCLUSIONS 


A.  SUMMARY  OF  MAIN  RESULTS 

We  have  presented  two  lossless  compression  techniques.  Both  techniques  use  a 
static  statistical  data  model.  Our  first  approach  (RTA)  involves  binary  necklace  classes 
and  multiple  Huffinan  trees. 

1.  RTA  compresses  typical  text  data  to  within  ±0.5%  of  standard  Huffinan 
encoding. 

2.  RTA  is  computationally  more  expensive  than  Huffman  encoding,  though  not 
prohibitively  so. 

3.  RTA  is  space-  (memory-)  inefficient  to  the  point  of  being  infeasible. 

4.  The  non-header  portion  of  a  file  compressed  by  RTA  typically  comes  within 
5%  of  the  non-header  portion  of  the  same  file  compressed  by  standard 
Huffman  encoding. 

5.  RTA  has  a  more  compact  header  than  standard  Huffinan  encoding.  The 
difference  between  the  headers  increases  as  n  increases.  Thus,  RTA  performs 
best  on  large  files  with  a  high  n. 
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Our  second  lossless  compression  approach  (IT A)  uses  the  same  basic  model  as 
RTA,  but  abandons  binary  necklaces  in  lieu  of  a  simpler  and  more  efficient  tree  indexing 
mechanism. 

6.  ITA  compresses  typical  text  data  one  to  three  percent  better  than  standard 
Huffman  encoding.  Compression  of  uncompressed  bitmapped  images  is 
roughly  equivalent  to  that  of  Huffman  encoding. 

7.  ITA  is  computationally  more  expensive  than  Huffman  encoding,  though  not 
prohibitively  so. 

8.  ITA  requires  less  memory  than  standard  Huffman  encoding. 

9.  The  non-header  portion  of  a  file  compressed  by  ITA  typically  comes  within 
5%  of  the  non-header  portion  of  the  same  file  compressed  by  standard 
Huffman  encoding. 

10.  ITA  has  a  more  compact  header  than  standard  Huffman  encoding.  The 
difference  between  the  headers  tends  toward  a  maximum  at  values  centered  on 
n/2  and  becomes  more  pronounced  as  n  increases. 

B.  FUTURE  RESEARCH 

This  paper  explores  the  use  of  binary  necklace  classes  for  lossless  compression 
using  a  static  statistical  model.  Our  conclusions  indicate  that  it  is  cheaper  to  pass  header 
data  for  many  small  trees  rather  than  a  single  large  tree.  Our  approaches  use  one  level  of 
indirection  (a  single  level  of  indirection  equates  to  one  thing  pointing  at  many  things)  -  a 
single  tree  pointing  at  a  single  forest.  An  interesting  extension  might  be  to  have  two  or 
more  levels  of  indirection.  With  two  levels  of  indirection  a  single  tree  would  point  at  a 
forest  of  trees,  and  each  tree  in  the  forest  would  then  point  at  another  forest.  This 
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approach  would  greatly  reduce  the  leaf  length  of  each  tree.  The  drawback  is  that  the  total 
number  of  trees  would  increase  exponentially  -  thus  greatly  increasing  the  chance  of 
overlapping  leaves,  something  we  found  to  be  a  drawback. 

A  mathematical  topic  closely  related  to  necklace  classes  is  that  of  Lyndon  words. 
Lyndon  words  offer  an  opportunity  to  explore  lossless  compression  using  a  dictionary- 
based  model.  Lyndon  words  exhibit  some  interesting  properties,  which  may  be 
applicable  to  data  compression.  Chapter  3  includes  a  short  discussion  on  Lyndon  words. 
Appendix  A  contains  some  preliminary  java  code,  which  generates  an  interesting  subset 
of  the  Lyndon  words. 
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APPENDIX  A:  JAVA  CODE 


package  thesis . compress ion. mult i_tree; 


/★* 

*  Allows  C++  style  assertions  to  be  inserted  in  Java  code  for  testing 
and  debugging. 

*/ 

public  class  Assertion  { 

public  static  boolean  assertOn  =  true; 

private  AssertionO  {};  //  no  public  constructor 

public  static  void  assert (boolean  validFlag) 
throws 

AssertionException 

{ 

if  (assertOn  &&  ivalidFlag)  { 

throw  new  AssertionException () ; 

} 

} 

public  static  void  assert (boolean  validFlag,  String  msg) 
throws 

As  s  e  r t i onExcep t i on 

{ 

if  (assertOn  &&  IvalidFlag)  { 

throw  new  AssertionException (msg) ; 

} 

} 

} 


package  thesis . compress ion. mult i_tree; 

public  class  AssertionException  extends  Runt imeExcept ion  { 
public  AssertionException 0  { 

super ("AssertionException") ; 

} 

public  AssertionException (String  msg)  { 
super (msg) ; 

} 

} 


package  thesis . compression . multi_tree ; 

j  it-k 

*  This  class  is  an  abstraction  for  a  string  of  bits.  Bitstring 
objects  have  length  and 
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*  bitPattern  attributes.  This  implementation  limits  the  length  of  a 
Bitstring  to  32  bits. 

*  Bitstring  objects  are  mutable.  This  means  that  both  the  length  and 
bitPattern  attributes 

*  of  an  instance  can  be  changed  by  setter  methods  after  the 
instantiation  of  the  object. 

*/ 

public  class  Bitstring  implements  Comparable  { 

public  static  final  short  MAXLEN; 
public  static  final  int[]  MASK; 
public  static  final  int []  MASK_R; 
public  static  final  int []  MASK_L; 
static  { 

MAXLEN  =  32; 

MASK  =  new  int [MAXLEN] ; 

MASK_R  =  new  int [MAXLEN] ; 

MASK_L  =  new  int [MAXLEN] ; 

for  (int  i  =  0;  i  <  MAXLEN;  i++)  { 

MASK[i]  =  1  <<  i; 

MASK_R[i]  =  -1  >>>  31  -  i; 

MASK_L[i]  =  -1  <<  i; 

} 

} 

private  short  length; 
private  int  bitPattern; 

j  -k  -k 

*  Returns  one  more  than  the  number  of  bits  to  the  right  of  the 
most  significant 

*  bit  in  bitPattern.  Alternatively,  this  may  be  thought  of  as 
the  minimum  length 

*  required  to  capture  all  the  significant  bits  of  bitPattern. 

*/ 

public  static  short  length (int  bitPattern)  { 
for  (int  i  =  0;  i  <  32;  i++)  { 

if  (bitPattern  <  0)  { 

return  (short) (32  -  i) ; 

} 

bitPattern  <<=  1; 

} 

return  0; 

} 

y  ★  * 

*  Operates  like  System. arraycopy  except  on  Bit St rings  instead  of 
arrays . 

* 

*  ©throws  IndexOutOfBoundsException  if  a  copy  operation  would 
exceed  the  bounds 

*  of  either  the  source  or  destination  Bitstring  object. 

public  static  void  bitCopy (Bitstring  src, 

int  srcPos, 

Bitstring  dst, 
int  dstPos, 
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int  numBits)  throws 

IndexOutOfBoundsException  { 

int  srcEnd  =  srcPos  +  numBits  -  1; 
int  dstEnd  =  dstPos  +  numBits  -  1; 

if  (srcPos  <  0  1  I  dstPos  <  0  |  [  numBits  <  0  | | 

srcEnd  >=  src. length  1|  dstEnd  >=  dst. length)  { 
throw  new  IndexOutOfBoundsException ( ) ; 

} 

//  isolate  the  source  bit  field 
int  s  =  src.bitPattern; 
int  d  =  dst .bitPattern; 

int  m  =  MASK_L [srcPos]  &  MASK_R  [srcEnd] ; 
s  &=  m; 

//  allign  the  mask  and  source  bit  field  with  the  destination 
bit  field. 

if  (srcPos  <  dstPos)  { 

int  Ishift  =  dstPos  -  srcPos ; 
s  <<=  Ishift; 
m  <<=  Ishift; 

}  else  { 

int  rshift  =  srcPos  -  dstPos ; 
s  >>>=  rshift; 
m  >>>=  rshift; 

} 

//  clear  the  dest  bit  field  then  AND  in  the  source  bit  field 
dst  .bitPattern  =  s  [  (d  &  (--m)  )  ; 

} 

*  Returns  the  Bitstring  object  represented  by  str. 

* 

*  ©throws  NumberFormatException  just  as  Integer .parseint (str,  2) 
would. 

*/ 

public  static  Bitstring  parseBitString (String  str)  throws 
NumberFormatException  { 

return  new  Bitstring (str . length () ,  Integer .parseint (str,  2)); 

} 

/* 

*  Creates  a  Bitstring []  to  represent  intArr.  All  BitStrings  of 
the  returned  array 

*  will  be  long  enough  to  contain  the  greatest  value  in  the  int [] . 
*/ 

public  static  Bitstring []  toBitStringArr (int []  intArr)  { 
if  (intArr  ==  null)  return  null; 

//  find  greatest  int  value  in  the  array 
int  greatest  =  intArr [0]; 
for  (int  i  =  1;  i  <  intArr . length;  i++)  { 
if  (intArr [i]  >  greatest)  { 
greatest  =  intArr [i]; 
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} 

} 

int  bitsNeeded  =  (int) Bitstring . length (greatest ) ; 

//  create  the  Bitstring  [] 

BitString[]  retValue  =  new  Bitstring [intArr. length] ; 
for  (int  i  =  0/  i  <  retValue . length;  i++)  { 

retValue [i]  =  new  Bitstring (bitsNeeded,  intArr [i]); 

} 


return  retValue; 

} 

/**  Constructs  a  default  Bitstring  object  of  length  =  0, 
bitPattern  =  0  */ 

public  Bitstring  0  { 
length  =  0; 
bitPattern  =  0; 

} 

/*★ 

*  Constructs  a  Bitstring  object  to  represent  bp 

★ 

*  ©throws  NumberFormatException  exactly  as  Integer .parseint (bp, 

2) 

*/ 

public  Bitstring (String  bp)  throws  NumberFormatException  { 
bitPattern  =  Integer .parseint (bp ,  2) ; 
length  =  (short ) bp . length () ; 

} 

/** 

*  Constructs  a  Bitstring  object  of  length  len  with  the  bitPattern 
attribute 

*  represented  by  bp. 

* 

*  ©throws  NumberFormatException  exactly  as  Integer .parseint (bp, 

2) 

*/ 

public  Bitstring (short  len.  String  bp)  throws  NumberFormatException 

{ 

relnitden,  Integer  .parseint  (bp,  2)); 

} 

/**  Constructs  a  Bitstring  object  of  length  len  and  bitPattern  bp. 

*/ 

public  Bitstring (short  len,  int  bp)  { 
relnitden,  bp); 

} 

/**  Constructs  a  Bitstring  object  of  length  len  and  bitPattern  0. 

public  Bitstring (short  len)  { 
relnitden,  0); 

} 
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/**  Constructs  a  Bitstring  object  of  minimum  length  needed  to 
properly  represent  bp.  */ 

public  Bitstring (int  bp)  { 
bitPattern  =  bp; 
length  =  length (bp); 

} 

/**  Constructs  a  Bitstring  object  of  mimimum  length  needed  to 
properly  represent  bp  */ 

public  Bitstring (Bitstring  bp)  { 
relnit (bp) ; 

} 

/**  allows  package  classes  to  create  BitStrings  without  bounds 
checking  */ 

Bitstring (int  len,  int  bp)  { 
bitPattern  =  bp; 
length  =  (short) len; 

} 

/**  Returns  true  if  the  bit  at  bit  position  index  is  a  1.  */ 

public  final  boolean  bitAt(int  index)  throws 
IndexOutOfBoundsException  { 
boundsCheck (index) ; 

return  (MASK  [index]  &  bitPattern)  !-  0; 

} 

/**  Sets  the  bit  at  bit  position  index  to  1 .  */  * 

public  final  void  setBit(int  index)  throws 
IndexOutOfBoundsException  { 
boundsCheck (index)  ; 
bitPattern  |=  MASK [index] ; 

} 

/**  Clears  the  bit  at  bit  position  index  */ 
public  final  void  clearBit(int  index)  throws 
IndexOutOfBoundsException  { 
boundsCheck (index) ; 
bitPattern  &=  -MASK [index] ; 

} 

/**  Sets  the  bit  at  bit  position  index  if  value  is  true,  else 
clears  the  bit.  */ 

public  final  void  assignBit (int  index,  boolean  value)  throws 
IndexOutOfBoundsException  { 
boundsCheck (index) ; 
if  (value  ==  true)  { 

bitPattern  |=  MASK [index] ; 

}  else  { 

bitPattern  &=  -MASK [index] ; 

} 

} 

/**  Returns  the  length  of  this  Bitstring.  */ 
public  final  short  length ()  { 
return  length; 

} 
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J  -k* 

*  Sets  the  length  of  this  Bitstring  to  len.  This  may  result  in 
truncation  if 

*  len  <  this. length. 

* 

*  ®  LengthOutOfBoundsException  if  len  <  0  or  len  > 

Bitstring . MAXLEN . 

*/ 

public  final  void  setLength (short  len)  throws 
LengthOutOfBoundsException  { 

relnitden,  bitPattern)  ; 

} 

/**  Returns  this  BitStrings  bitPattern  attribute.  */ 
public  final  int  bitPattern ()  { 
return  bitPattern/ 

} 

y  ★  * 

*  Sets  this  BitStrings  bitPattern  attribute  to  bp.  Note,  if 
this. length  is  not 

*  great  enough  bp  may  be  truncated. 

*/ 

public  final  void  setBitPattern (int  bp)  { 
relnit (length,  bp) ; 

} 

/** 

*  Sets  this  BitStrings  bitPattern  to  represent  that  of  bp.  Note, 
if  this. length  is 

*  not  great  enough  bp  may  be  truncated. 

* 

*  ©throws  NumberFormat Except ion  exactly  as  Integer .parseint (bp, 

2)  . 

*/ 

public  final  void  setBitPattern (String  bp)  throws 
Number Format Except ion  { 

relnit (length.  Integer .parseint (bp,  2) ) ; 

} 

/**  allows  package  classes  to  relnit  BitStrings  without  bounds 
checking  */ 

final  void  relnit (int  len,  int  bp)  { 
length  =  (short) len; 
bitPattern  =  bp; 

} 

/** 

*  Reinitializes  this  Bitstring  to  length  len  and  bitPattern  bp 

* 

*  ©throws  NumberFormatException  exactly  as  Integer .parseint (bp, 

2) 

*/ 

public  final  void  relnit (short  len.  String  bp)  throws 
NumberFormatException  { 

relnit  (len.  Integer .parseint (bp,  2) ) ; 
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} 

/** 

*  Reinitializes  this  Bitstring  to  length  len  and  bitPattern  bp 

* 

*  ©throws  LengthOutOfBoundsException  if  len  <  0  OR  len  >  MAXLEN. 
*/ 

public  final  void  relnit (short  len,  int  bp)  throws 
LengthOutOfBoundsException  { 

if  (len  <  0  I  I  len  >  MAXLEN)  { 

throw  new  LengthOutOfBoundsException () ; 

}  else  if  (len  ==  0)  { 
length  =  0; 
bitPattern  =0; 

}  else  { 

length  =  len; 

bitPattern  =  bp  &  MASK_R [length  -  1]/ 

} 

} 

/**  Reinitializes  this  Bitstring  to  bp  */ 
public  final  void  relnit (Bitstring  bp)  { 
length  =  bp. length; 
bitPattern  =  bp .bitPattern; 

} 

/**  AND*s  this  Bitstring  with  bp.  Only  those  bits  that  overlap 
between  this  Bitstring 

*  and  bp  are  affected 
*/ 

public  final  void  and (Bitstring  bp)  { 

bitPattern  &=  ( MAS K_L [bp . length]  |  bp. bitPattern) ; 

}. 

/**  OR's  this  Bitstring  with  bp.  Only  those  bits  that  overlap 
between  this  Bitstring 

*  and  bp  are  affected 
*/ 

public  final  void  or (Bitstring  bp)  { 
if  (length  ==  0)  return; 

bitPattern  |=  (bp .bitPattern  &  MASK_R [length  -  1]); 

} 

j  -k-k 

*  Logically  shifts  the  bits  of  this  Bitstring  one  position  to  the 
left.  If  wrapOn 

*  is  TRUE  then  the  most  significant  bit  will  be  shifted  into  the 
least  significant 

*  bit  position. 

*/ 

public  final  int  iShift (boolean  wrapOn)  { 
if  (length  ==0)  { 
return  0; 

} 

int  highBit  =  (MASK [length  -  1]  &  bitPattern)  !=  0  ?  1  :  0; 
bitPattern  <<=  1; 
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bitPattern  &=  MASKER [length  -  1] ; 

if  (wrapOn)  { 

bitPattern  |=  highBit; 

} 

return  highBit; 

} 

/** 

*  Logically  shifts  the  bits  of  this  Bitstring  numBits  % 
this. length  to  the  left. 

*  If  wrapOn  is  TRUE  then  bits  are  shifted  in  from  the  right  as 
they  are  shifted 

*  out  from  the  left . 

*/ 

public  final  void  lShift(int  numBits,  boolean  wrapOn)  { 
if  (numBits  <  0)  { 

rShift (numBits  *  -1,  wrapOn); 

}  else  if  (numBits  *  bitPattern  ==  0)  { 

/ 

}  else  if  (! wrapOn)  { 

//  IShift  with  true  has  bug 
bitPattern  <<=  numBits; 
bitPattern  &=  MASK_R [length  -  1]  ; 

}  else  { 

int  nBits  =  numBits  %  length; 

int  m  =  bitPattern  &  ( MAS K_R  [length  -*  1]  &  MASK_L  [length  - 

nBits] )  ; 

bitPattern  &=  -m; 
bitPattern  <<=  nBits; 

bitPattern  |=  (m  >>>  (length  -  nBits)); 

} 

} 

/** 

*  Logically  shifts  the  bits  of  this  Bitstring  one  position  to  the 
right.  If  wrapOn 

*  is  TRUE  then  the  least  significant  bit  will  be  shifted  into  the 
most  significant 

*  bit  position. 

*/ 

public  final  int  rShif t (boolean  wrapOn)  { 
int  lowBit  =  bitPattern  &  1; 
bitPattern  >>>=  1; 

if  (wrapOn  &&  lowBit  ==  1)  { 

bitPattern  1=  MASK [length  ~  1]; 

} 

return  lowBit; 

} 

/** 

*  Logically  shifts  the  bits  of  this  BitString  numBits  % 
this. length  to  the  right. 
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*  If  wrapOn  is  TRUE  then  bits  are  shifted  in  from  the  left  as 
they  are  shifted 

*  out  from  the  right. 

*/ 

public  final  void  rShift(int  numBits,  boolean  wrapOn)  { 
if  (numBits  <  0)  { 

IShift (numBits  *  -1,  wrapOn); 

}  else  if  (numBits  *  bitPattern  ==  0)  { 

/ 

}  else  if  (IwrapOn)  { 

bitPattern  >>>=  numBits; 

}  else  { 

int  nBits  =  numBits  %  length; 

int  m  =  bitPattern  &  MAS K_R [nBits  -  1] ; 

bitPattern  >>>=  nBits; 

bitPattern  1=  (m  <<  (length  -  nBits)); 

} 

} 

/**  Returns  the  position  of  the  least  significant  1  in  this 
Bitstring  */ 

public  final  int  leastSiglO  { 
int  bp  *:  bitPattern; 
int  i  s  0 ; 

while  ( (bp  &  1)  ==  0)  { 

bp  >  >  >  «  1 ; 

i  ♦  ♦  ; 

} 

return  i  <  MAXLEN  ?  i  :  -1; 


/★* 

*  Concater.anec  rVal  to  this  Bitstring. 

* 

*  ©throws  LengthOutOfBoundsException  if  length  +  rVal. length  > 
Bitstring .  MAXLE?; . 

*/ 

public  final  vDid  concat (Bitstring  rVal)  throws 
LengthOutOf BoundsExcept ion  { 

short  conratLen  «  (short) (length  +  rVal . length) ; 
if  (concatben  >  MAXLEN)  { 

throw  new  LengthOutOfBoundsException ( 

"Concatenated  result  would  exceed  ”  +  MAXLEN) ; 

} 

bitPattern  <<*  rVal. length; 
bitPattern  i«  rVal .bitPattern; 
length  =  concat Len; 


public  int  hashCodeO  { 
return  bitPattern; 

} 

public  string  toStringO  { 

if  (length  -=  0)  return  "[0]”; 
char[]  out  =  new  char  [length]  ; 
for  (int  i  =  0;  i  <  length;  i++)  { 
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out [length  -  1  -  i]  =  bitAt{i)  ?  :  'O'; 

} 

return  String .valueOf (out) ; 

} 

public  final  int  compareTo {Bitstring  bs)  { 
if  (length  !=  bs. length)  { 

//  the  longer  Bitstring  is  always  considered  larger 
return  length  -  bs. length; 

}  else  if  ( (bitPattern  |  bs .bitPattern)  <  0)  { 

//  two  MAXLEN  BitStrings,  one  or  both  have  their  high  bits 

set 

if  (bitPattern  <  0  &&  bs .bitPattern  >=  0)  { 

//  this  Bitstring  negative,  rhs  non-negative 
return  1; 

}  else  if  (bitPattern  >=  0  &&  bs .bitPattern  <  0)  { 

//  this  Bitstring  non-negative,  rhs  negative 
return  -1; 

}  else  { 

//  both  negative 

return  (bitPattern  &  Integer .MAX_VALUE)  - 

(bs  .bitPattern  &  Integer.MAX__VALlJE)  ; 

} 

}  else  { 

//  two  equal  BitStrings  w/o  their  high  bits  set 
return  bitPattern  -  bs .bitPattern; 

} 

} 

public  final  int  compareTo (Ob j ect  ob j )  { 

return  compareTo ( (Bitstring) obj ) ; 

} 

public  final  boolean  equals (Object  obj)  { 

if  ((obj  !=  null)  &&  (obj  instanceof  Bitstring))  { 

Bitstring  bs  =  (Bitstring) obj ; 

return  (length  ==  bs. length)  &&  (bitPattern  == 
bs .bitPattern) ; 

} 

return  false; 

} 

private  final  void  boundsCheck (int  index)  throws 
I ndexOu t O f B ound s Exception  { 

if  (index  <  0  | |  index  >=  length)  { 

throw  new  IndexOutOfBoundsException ( ) ; 

} 

} 


package  thesis . compre ss ion. mult i__tree; 


import  java.io.*; 
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import  java.util.*; 


/  it-k 

*  Objects  of  this  class  are  used  to  read  data  from  a  file.  Data  is 
read  in  chunks 

*  from  1  to  BitString.MAXLEN  bits. 

*/ 

public  class  BitStringReader  implements  ByteMask  { 
private  static  final  int  BUF_SIZE  =  81;  //92 


private  static  final  int  DEFAULT_ 

private  byte []  buf; 
inStream 

private  int  endOfBuf; 
private  int  oldEndOfBuf; 
reset  0  needs  it 

private  int  bufFilled; 

(and  overwritten) 

private  long  peekSizeAdj ; 
inaccuracy  induced  by  peekBitString ( ) 
private  String  filename; 
opened  /  reopened 

private  FileInputStream  inStream; 
private  int  n; 

read 

private  int  bytePos; 

buf 

private  int  bitPos; 

byte 

private  Bitstring  scratch; 
object  creation 

private  long  numDecoded; 
this  reader 

private  long  decodeLimit; 
decode  with  this  reader 


=  8; 

//  holds  bytes  read  in  from 

//  first  unusable  byte  in  buf 
//  remembers  endOfBuf  in  case 

//  #  times  this  buf  was  filled 

//  corrects  'bits  read' 

//  name  of  the  file  to  be 

//  underlying  data  stream 
//  default  Bitstring  length  to 

//  byte  marker  within  current 

//  bit  marker  within  current 

//  helps  avoid  unnecessary 

//  #  Huffman  codes  decoded  with 

//  max  #  Huffman  codes  to 


/** 

*  Constructs  an  instance  that  reads  from  file  filename  and  uses  n 
as  the 

*  default  number  of  bits  to  read  for  calls  to  readBitString ( ) . 

*/ 

public  BitStringReader (String  filename,  int  n) 
throws 

F i 1 eNo t FoundExc ep t i on , 
lOException, 

LengthOutOfBoundsException 

{ 

buf  =  new  byte [BUF_SIZE] ; 

inStream  =  new  FileInputStream (filename) ; 
this . filename  =  filename; 
setN (n) ; 

endOfBuf  =  inStream. read (buf ) ; 
oldEndOfBuf  =  endOfBuf; 
bufFilled  =  bytePos  =  0; 
bitPos  =  7; 

scratch  =  new  Bitstring (); 
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numDecoded  =  peekSizeAdj  =  0; 
decodeLimit  =  Long . MAX^VALUE ; 

} 

/★* 

*  Constructs  an  instance  that  reads  from  file  filename  and  uses 
DEFAULT__N  as  the 

*  default  number  of  bits  to  read  for  calls  to  readBitString ( ) , 
*/ 

public  BitStringReader (String  filename) 
throws 

FileNotFoundException, 

lOException, 

LengthOutOfBoundsException 

{ 

this (filename,  DEFAULT_N) ; 

} 

*  Changes  the  number  of  bits  read  by  a  call  to  readBitString () 
from  the  current 

*  value  to  n. 

* 

*  ©throws  LengthOutOfBoundsException  ifn<0orn> 

Bi tS t ring . MAXLEN . 

V 

public  final  void  setN(int  n)  throws  LengthOutOfBoundsException  { 
if  (n  <  0  1 1  n  >  Bitstring, MAXLEN)  { 

throw  new  LengthOutOfBoundsException () ; 

} 


this.n  =  n; 

} 

/**  Sets  the  number  of  BitStrings  readable  from  a  file  using 
decodeBitString 0  .  */ 

public  final  void  setDecodeLimit (long  limit)  { 
decodeLimit  =  limit; 

} 


/**  Returns  the  number  of  bits  read  by  this  BitStringReader.  */ 
public  final  long  bitsReadO  { 

return  (long) buf Filled  *  (long) BUF_SIZE  *  8L  +  (long) bytePos 

8L  + 


} 


7L  -  (long)bitPos  +  (long) peekSizeAdj  *  8L; 


* 


/**  Returns  the  filename  of  the  file  being  read  by  this 
BitStringReader .  */ 

public  final  String  filename ()  { 

return  filename; 

} 


^  -k-k 

*  Returns  a  Bitstring  that  represents  the  next  n  bits  of  the  file 

being 
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*  read  by  this  BitStringReader .  n  is  an  already  set  instance 
variable  of 

*  this  BitStringReader.  Thus,  this  is  the  default  read  operation 
which  is  useful 

*  if  you  need  to  read  fixed- length  bit  strings  from  a  source 
file.  Note:  if  there 

*  are  only  x  bits  left  in  the  file  where  x  <  n  then  a  Bitstring 
of  length  x  will  be 

*  returned.  Thus,  a  returned  Bitstring  of  length  <  n  is  a  signal 
that  the  EOF  has 

*  been  reached.  Further  calls  to  this  method  after  EOF  will 
return  BitStrings  of 

*  length  0 . 

*/ 

public  final  Bitstring  readBitString ( )  throws  lOException  { 
return  readBitString (n) ; 

,} 

/** 

*  Returns  a  Bitstring  that  represents  the  next  len  bits  of  the 
file  being 

*  read  by  this  BitStringReader.  Note:  if  there  are  only  x  bits 
left  in  the  file 

*  where  x  <  len  then  a  Bitstring  of  length  x  will  be  returned. 
Thus,  a  returned 

*  Bitstring  of  length  <  len  is  a  signal  that  the  EOF  has  been 
reached.  Further  calls 

*  to  this  method  after  EOF  will  return  BitStrings  of  length  0. 

*/ 

public  Bitstring  readBitString (int  len) 

throws 

lOException, 

LengthOutOfBoundsException' 

{ 

int  bitsNeeded  =  ien; 

Bitstring  temp  =  scratch; 

Bitstring  bs  =  new  Bitstring ( (short) bitsNeeded) ; 

while  (bitsNeeded  >0)  { 

if  (bytePos  ==  endOfBuf)  { 

//  at  the  end  of  the  current  buffer 
endOfBuf  =  inStream. read (buf ) ; 
if  (endOfBuf  ==  -1)  { 

endOfBuf  =  bytePos; 

bs . relnit (len  -  bitsNeeded,  bs .bitPatternO )  ; 
return  bs; 

}  else  { 

buf Filled++; 
bytePos  =  0; 

} 

}  else  if  (bitsNeeded  >  bitPos)  { 

//  need  all  of  the  current  byte 
int  pieceLen  =  bitPos  +  1; 
bs . iShif t (pieceLen,  false) ; 

temp .  relnit  (pieceLen,  buf  [bytePos]  &  MASK__R  [bitPos]  )  ; 
bs .or (temp) ; 
bitsNeeded  -=  pieceLen; 
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bytePos++; 
bitPos  =  7; 

}  else  { 

//  need  only  some  of  the  current  byte 
bs . IShif t (bitsNeeded,  false) / 

int  t  =  (buf [bytePos]  &  MASK_R [bitPos] )  >>>  (bitPos  - 

bitsNeeded  +  1) ; 

temp . relnit (bitsNeeded,  t) ; 
bs -  or (temp) ; 
bitPos  -=  bitsNeeded; 
bitsNeeded  =0; 

} 

} 

return  bs ; 

} 

/** 

*  This  method  is  identical  in  functionality  to  readBitString (len) 
except  that  the 

*  the  underlying  state  of  this  BitStringReader  remains  unchanged 
after  each  call. 

*  Thus,  multiple  calls  to  this  method  are  guaranteed  to  return 
the  same  result 

*  so  long  as  no  intermediate  calls  to  readBitString  or 
decodeBitString  are  made. 

*  This  method  is  useful  to  determine  what  is  coming  up  in  the 
input  stream  without 

*  actually  advancing  the  input  stream  pointer. 

*/ 

public  Bitstring  peekBitString (int  len) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

int  bitsNeeded  =  len; 

Bitstring  temp  =• scratch; 

Bitstring  bs  =  new  Bitstring  ( (short ) bitsNeeded) ; 
int  bitPos  =  this. bitPos; 
int  bytePos  =  this .bytePos ; 
int  endOfBuf  =  this .endOf Buf ; 

while  (bitsNeeded  >  0)  { 

if  (bytePos  ==  endOfBuf)  { 

//at  the  end  of  the  current  buffer 
int  bytesFromOldBuf  =  this . endOfBuf  -  this .bytePos ; 
System. arraycopy (buf ,  this .bytePos,  buf,  0, 
bytesFromOldBuf) ; 

endOfBuf  =  inStream. read (buf ,  bytesFromOldBuf,  BUF_SIZE 
-  bytesFromOldBuf)  + 

bytesFromOldBuf ; 
bufFilled++; 

peekSizeAdj  -=  (BUF__SIZE  -  this . bytePos)  ; 

if  (endOfBuf  <  bytesFromOldBuf)  { 

//  no  more  bytes  to  read  from  source  file 
this . endOfBuf  =  bytesFromOldBuf; 
this. bytePos  =  0; 
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bs . relnit (len  -  bitsNeeded,  bs .bitPattern () ) ; 
return  bs; 

}  else  { 

// 

this .endOfBuf  =  endOfBuf; 
this.bytePos  =  0; 
bytePos  =  bytesFromOldBuf ; 

} 

}  else  if  (bitsNeeded  >  bitPos)  { 

//  need  all  of  the  current  byte 
int  pieceLen  =  bitPos  +  1; 
bs . IShif t (pieceLen,  false); 

temp. relnit  (pieceLen,  buf  [bytePos]  &  MASK__R  [bitPos]  ) 

bs .or (temp) ; 

bitsNeeded  -=  pieceLen; 

bytePos ++; 

bitPos  =  7; 

}  else  { 

//  need  only  some  of  the  current  byte 
bs . IShif t (bitsNeeded,  false) ; 

int  t  =  (buf [bytePos]  &  MASK_R [bitPos] )  >>>  (bitPos 

bitsNeeded  +  1)  ; 

temp . relnit (bitsNeeded,  t ) ; 
bs .or (temp) ; 
bitPos  “=  bitsNeeded; 
bitsNeeded  =0; 

} 

} 

return  bs; 

} 

y  * 

*  Reads  lenOf ShortestHuf Code  bits  from  the  input  stream  and 
checks  to  see  if  this 

*  key  is  found  in  decodingMap.  If  it  is  then  a  Bitstring 
representing  the  value 

*  mapped  to  by  the  key  is  returned.  Else,  another  bit  is 
concatenated  to  the 

*  current  key  and  another  lookup  in  decodingMap  is  performed, 
the  new  key 

*  is  found  in  decoding  map  then  a  Bitstring  representing  the 
value  mapped  to  by 

*  the  new  key  is  returned.  This  process  continues  untill  a 
Bitstring  is  returned, 

*  the  key  exceeds  BitString.MAXLEN  (which  throws  a' 
LengthOutOfBounds  Exception) ,  or 

*  there  are  no  more  bits  to  read  from  the  input  stream  (which 
throws  an 

*  AssertionException) . 

*/ 

public  final  Bitstring  decodeBitString (HashMap  decodingMap, 

int  lenOf ShortestHuf Code) 

throws 

lOException 

{ 

if  (numDecoded  ==  decodeLimit)  { 
return  new  Bitstring (); 


If 
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} 

Bitstring  hufCode  =  readBitString (lenOf ShortestHufCode) ; 
Bitstring  aBit; 

while  ( IdecodingMap . containsKey (hufCode) )  { 
aBit  =  readBitString (1)  ; 

Assertion. assert (aBit . length ( )  ==  1) ; 
huf Code . concat (aBit) ; 

} 

nuTnDecoded++  ; 


//  returning  the  VALUE  that  huf  Code  maps  to  in  the  decoding  map 
(its  class  index) 

//  reusing  huf Code  for  return  value  to  avoid  object  creation 
huf Code .relnit ( (Bitstring) decodingMap .get (huf Code) ) ; 
return  huf Code; 

} 


//  PRE:  0 
//  POST: 
// 

// 

// 

// 

of  the  source 


<  nBound  <=  BitString . MAXLEN 
returns  either, 

1)  a  Lyndon  word  of  length  <=  nBound 

2)  a  contiguous  string  of  O’s  of  length  <=  nBound 

3)  a  contiguous  string  of  I's  of  length  ==  nBound 

4)  any  Bitstring  of  length  <=  nBound  iff  its  the  tail 
file 


//  5)  a  Bitstring  of  length  0  iff  no  bits  remain  in  the 

source  file 

public  Bitstring  readLyndonWord (int  nBound) 

throws 

lOException 


{ 


Bitstring  sourceBits  =  peekBitString (nBound) ; 
if  (sourceBits . length ( )  <  nBound)  { 

//  remainder  bits,  case  4  or  5 

return  readBitString (sourceBits . length ()  )  ; 

} 


int  returnLen  =  0; 
int  lenSoFar  =  1; 

int  aBit  =  sourceBits . IShift (false) ; 

if  (aBit  ==  0)  { 

//  a  leading  0,  case  2 
aBit  =  sourceBits . IShift (false) ; 
while  (aBit  ==  0  &&  lenSoFar  <  nBound)  { 
lenSoFar++; 

aBit  =  sourceBits . IShift (false) ; 

} 

returnLen  =  lenSoFar; 

}  else  { 

//  a  leading  1,  case  1  or  3 
aBit  =  sourceBits . IShift (false) ; 
while  (aBit  ==  1  &&  lenSoFar  <  nBound)  { 
lenSoFar++; 

aBit  =  sourceBits . IShift (false) ; 

} 


if  (lenSoFar  ==  nBound)  { 
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//  case  3; 

returnLen  =  lenSoFar; 

}  else  { 

//  case  1 

int  numLeadls  =  lenSoFar++; 

returnLen  =  lenSoFar; 

int  numNonLeadls  =  0; 

aBit  =  sourceBits . IShift (false)  ; 

while  (numNonLeadls  <  numLeadls  &&  lenSoFar  <  nBound) 

{ 

lenSoFar++; 
if  (aBit  ==  0)  { 

returnLen  =  lenSoFar; 
numNonLeadls  =  0; 

}  else  { 

numNonLeadls ++ ; 

} 

aBit  =  sourceBits . IShift (false) ; 

} 

} 

} 

return  readBitString (returnLen) ; 

} 

/**  Resets  the  input  stream  of  this  BitStringReader  to  the  first 
bit  in  the  file.  */ 

public  void  reset  ()  throws  lOException  { 
if  (bufFilled  >  0)  { 

inStream. close  0 ; 
inStream  =  null; 

System. runFinalization { ) ; 

System. gc  ()  ; 

inStream  =  new  FileInputStream (filename) ; 
endOfBuf  =  oldEndOfBuf  =  inStream. read (buf) ; 
bufFilled  =  0; 
peekSizeAdj  =  0 ; 

}  else  { 

endOfBuf  =  oldEndOfBuf; 

} 

bytePos  =  0; 
bitPos  =  7; 
nuitiDecoded  =  0; 

} 

/** 

*  Closes  this  BitStringReader. 

*/ 

public  void  close ()  throws  lOException  { 
inStream.  close  0  ; 

} 
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package  thesis . compression .mult i_tree; 


import  java.io.*; 
import  j  ava . ut i 1 . * ; 

/** 

*  Objects  of  this  class  are  used  to  write  variable  length  BitStrings 
to  a  file. 

*  Upon  closing  the  BitStringWriter  any  bits  beyond  the  last  byte 
boundary  will  be 

*  padded  with  enough  0  bits  as  to  make  the  total  number  of  bits 
written  to  the 

*  target  file  divisible  by  8. 

*/ 

public  class  BitStringWriter  implements  ByteMask  { 

private  static  final  int  BUF_SIZE  =  8192; 
private  byte[]  buf; 

private  FileOutputStream  outStream; 

private  int  bufFilled;  //  #  times  buf  was 

filled 

private  int  bytePos; 
private  int  bitPos; 
private  Bitstring  scratch; 

/**  Constructs  a  BitStringWriter  object  to  write  to  file  'file'.  */ 
public  BitStringWriter (String  file)  throws  FileNotFoundException  { 
buf  =  new  byte [BUF_SIZE] ; 
outStream  =  new  FileOutputStream (file) ; 
bufFilled  =  0; 
bytePos  =  0; 
bitPos  =  7; 

scratch  =  new  Bitstring (); 

} 

/**  Returns  the  number  of  bits  written  by  this  BitStringWriter.  */ 
public  long  bitsWroteO  { 

return  (long) buf Filled  *  (long) BUF_SIZE  *  8L  +  (long) bytePos  * 
SL  +  7L  “  (long) bitPos ; 

} 

/**  Writes  Bitstring  bStr  to  the  output  stream  */ 
public  void  writeBitString (Bitstring  bStr)  throws  lOException  { 
Bitstring  bs  =  scratch; 
bs . relnit (bStr) ; 
while  (bs. length  0  >  0)  { 

if  (bytePos  ==  BUF_SIZE)  { 

//  buffer  is  full 

outStream. write (buf) ; 

Arrays .fill (buf,  (byte) 0) ; 
bytePos  =  0 ; 
bufFilled++; 

}  else  if  (bs. length 0  >  bitPos)  { 

//  need  all  of  the  current  byte 
int  bitsAvail  =  bitPos  +  1; 
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byte  b  =  (byte) (bs .bitPattern ( )  >>>  bs. length ()  - 

bitsAvail) ; 

buf[bytePos]  |=  b; 

bs.setLength( (short)  (bs. length 0  -  bitsAvail)); 

bytePos++; 

bitPos  =  7; 

}  else  { 

//  need  only  some  of  the  current  byte 

byte  b  =  (byte) (bs .bitPattern ( )  <<  bitPos  -  bs.lengthO 

+  1)  ; 

buf [bytePos]  |=  b; 
bitPos  “=  bs.lengthO; 
bs . se tLength ( ( short ) 0 ) ; 

} 

} 

} 

/** 

*  Convenience  method  that  writes  all  the  BitStrings  of 
Bitstring []  bsArr  to 

*  the  output  stream  in  index  order 
*/ 

public  void  writeBitString (Bitstring []  bsArr)  throws  lOException  { 
for  (int  i  =  0;  i  <  bsArr . length;  i++)  { 

writeBitString (bsArr [i] ) ; 

} 

} 

*  Closes  this  BitStringWriter . 

*/ 

public  void  close ()  throws  lOException  { 

outStream. write (buf ,  0,  bytePos  +  (bitPos  ==7?0:1)); 
outStream. flush  0 ; 
outStream. close  0 ; 
outStream  =  null; 

System. runFinalizat ion  0 ; 

System. gc () ; 

} 

} 


package  thesis . compression . multi_tree ; 


public  interface  ByteMask  { 

public  static  final  int[]  MASK_R  =  {OxOOOOOOOl,  0x00000003, 
0x00000007,  OxOOOOOOOF, 

OxOOOOOOlF,  OxOOOOOOSF, 

0X0000007F,  OxOOOOOOFF}; 

public  static  final  int  []  MASK__L  =  {OxOOOOOOFF,  OxOOOOOOFE, 
OxOOOOOOFC,  OxOOOOOOFS, 
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OxOOOOOOFO,  OxOOOOOOEO, 

OXOOOOOOCO,  0x00000080}; 

} 


package  thesis . compression . huf fman; 

import  java .util . * ; 
import  java.io.*; 

import  thesis . compression. mult i_tree ; 

/  *  * 

*  An  instance  of  this  class  compresses  a  sourceFile  using  standard 
Huffman  encoding 

*  and  n~bit  input  symbols. 

*/ 

public  class  Huf fmanCompressionManager  { 

HashMap  freqTable; 

BitStringReader  sourceFile; 

BitStringWriter  targetFile; 
int  n; 

Bitstring  remainder; 

Bitstring  offset; 


public  static  void  main (String []  args)  { 

String  usage  =  new  String ( "USAGE :  java 
Huf fmanCompressionManager  "  + 

"<sourceFilename>  <targetFilename>  <n>  <offset>"); 

try  { 

if  (args. length  !=  4)  { 

throw  new  ArrayIndexOutOf  BoundsException  ( )  ; 

} 

int  n  =  Integer .parseint (args [2] ) ; 
int  offset  =  Integer .parseint (args [3] )  ; 

BitStringReader  sourceFile  =  new  BitStringReader (args [0] , 

n)  ; 

BitStringWriter  targetFile  =  new  BitStringWriter (args [1] ) ; 
Huf fmanCompressionManager  hem  = 

new  Huf  fmanCompressionManager (sourceFile,  targetFile, 

n,  offset) ; 

hem. compress ( ) ; 

}  catch  (NumberFormatException  arg30r4NotInt)  { 

System. out .println (usage) ; 

}  catch  (ArrayIndexOutOf BoundsException  wrongNumArgs)  { 

System. out. println (usage) ; 

}  catch  (LengthOutOf BoundsException  paramsOutOf Range)  { 

System. out .println  ("1  <  n  <=  "  +  BitString.MAXLEN  +  "  AND 
0  <=  offset  <  n") ; 

}  catch  (Exception  e)  { 

e .printStackTrace  (new  PrintStream (System. out) ) ; 

}  finally  { 
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System. exit (1) ; 

} 

} 

public  Huf fmanCompressionManager (BitStringReader  sourceFile, 

BitStringWriter  targetFile, 
int  n, 

int  offsetLen) 

throws 

lOException, 

LengthOut O  f Bounds  Except i on 

{ 

//  bounds  check 

if  (n  <  2  I  I  n  >  BitString.MAXLEN  |  |  offsetLen  <  0  |  |  offsetLen 

>=  n)  { 

throw  new  LengthOutOfBoundsException ( ) ; 

} 

//  initialize  instance  variables 

this.n  =  n; 

sourceFile . setN (n) ; 

this . sourceFile  =  sourceFile; 

this . targetFile  =  targetFile; 

offset  *  sourceFile . readBitString (of fsetLen)  ; 
remainder  =  new  BitStringO; 
freqTable  *  new  HashMapO; 

} 

public  void  compress  0 
throws 
lOExcept ion 
{ 

buildFreqTable ( sourceFile,  freqTable,  n,  remainder); 
compressSo-rceFile (targetFile,  sourceFile,  freqTable,  n, 
remainder,  offset  ; 

} 

public  final  long  sizeAfterO  { 

return  t araet Fi le .bitsWrote () ; 

} 

protected  void  bui IdFreqTable (  /*  IN  */  BitStringReader 
sourceFile, 

/*  OUT  */  HashMap  freqTable, 

/*  IN  */  int  n, 

/*  OUT  */  Bitstring  remainder) 

throws 

lOException 

{ 

Bitstring  bs  =  sourceFile . readBitString () ; 
while  (bs. length  0  ==  n)  { 

if  (f reqTable.containsKey (bs)  )  { 

(  (VonNoymanNode)  (freqTable .get  (bs)  ) )  .incrFreqO  ; 

}  else  { 

freqTable .put (bs,  new  VonNoymanNode (bs) ) ; 

} 

bs  =  sourceFile. readBitString {) ; 
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} 


} 

remainder . relnit (bs) ; 


protected  void  compressSourceFile {/* 
targetFile, 

/* 

sourceFile, 

/* 

/* 

/* 

/* 

throws 

lOException 

{ 


OUT  */ 

IN  */ 

IN/ OUT  */ 
IN  */ 

IN  */ 

IN  */ 


BitStringWriter 

BitStringReader 

HashMap  f reqTable , 
int  n. 

Bitstring  remainder. 
Bitstring  offset) 


//  write  compression  header 

writeCompressionHeader (targetFile ,  sourceFile,  n,  f reqTable, 
remainder,  offset) ; 


HashMap  encodingMap  =  freqTable; 
sourceFile. reset  0 ; 

sourceFile . readBitString (of f set . length 0) ;  //  throw  away 

offset 


//  write  sourceFile  to  targetFile  in  compressed  form 
Bitstring  bs  =  sourceFile . readBitString () ; 
while  (bs. length ()  ==  n)  { 

bs  =  (Bitstring) encodingMap .get (bs) ; 
targetFile -writeBitString (bs) ; 
bs  =  sourceFile.readBitStringO  ; 

} 

Assertion . assert (remainder . equals (bs) ) ; 
targetFile . close ( ) ; 

} 


protected  void  writeCompressionHeader  (/* 

targetFile, 

/* 

sourceFile , 

/* 

/* 

freqTable, 

/* 

remainder, 

/* 

offset) 

throws 

lOException 

{ 


OUT  */  BitStringWriter 

IN  */  BitStringReader 

IN  */  int  n, 

IN/ OUT  */  HashMap 

IN  ^/  Bitstring 

IN  */  Bitstring 


//  first  entries  in  header  of  target  file 
targetFile .writeBitString (new  Bitstring ( (short ) 5 ,  n) ) ; 
targetFile .writeBitString (new  Bitstring ( (short) 5, 
offset . length ( ) )  )  ; 

targetFile. writeBitString (offset) ; 
targetFile .writeBitString (new  Bitstring ( (short) 5, 
remainder . length ( ) ) ) ; 

targetFile .writeBitString (remainder) ; 
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//  convert  the  HashMap  from  a  freqency  table  to  an  encoding  map 
and  write 

//  header  info  for  the  Huffman  tree 
processTree (freqTable,  targetFile) ; 

//  write  numBitStringsRead  as  last  entry  in  header 
long  numBitStringsRead  =  (sourceFile .bitsRead ()  - 
offset .length 0 )  /  n; 

Assertion. assert (numBitStringsRead  <=  Integer .MAX_VALUE) ; 
targetFile. writeBitSt ring (new  Bitstring { (short) 32, 

(int) numBitStringsRead) ) ; 

System. out. println ("******  HUFFMAN  HEADER  SIZE  FOR  ”  + 
sourceFile . filename ( )  + 

"  with  N  =  "  +  n  +  "  is  ”  +  (targetFile .bitsWrote ( )  /  8)  + 

”  bytes  ******»')  ; 

} 

protected  void  processTree (/*  IN/OUT  */  HashMap  freqTable, 

/*  OUT  */  BitStringWriter 

targetFile) 

throws 

lOException 

{ 

//  if  the  freq  table  is  empty  write  minimal  header  info  and 

return 

if  (freqTable. isEmptyO  )  { 

targetFile .writeBitString (new  Bitstring ( (short) 5,  0)) ; 
return; 

} 

//  order  all  the  VonNoymanNodes  of  this  table  based  on 
frequency 

TreeSet  f reqOrderedVNN  =  new  TreeSet (freqTable. values ()) ; 

//  build  a  frequency  ordered  bit  string  list  for  the 
compression  header 

Bitstring  []  f reqOrderedBitStringList  =  new 
Bitstring  [f reqOrderedVNN. size  () ] ; 

Iterator  it  =  f reqOrderedVNN. iterator ( ) ; 
int  i  =  0; 

while  (it .hasNext 0 )  { 

f reqOrderedBitStringList [i++]  = 

( ( VonNoymanNode ) i t - next ( ) ) . index; 

} 

//  build  leafs  at  level  list  for  the  compression  header 
//  NOTE:  this  algorithm  trashes  its  TreeSet  parameter  AND  the 
underlying 

//  HashMap  upon  which  it  is  based 
Bitstring  []  leaf sAtLevelList  = 

VonNoymanNode .  vonNoymanAlgorithm  (f  reqOrderedVNN)  ; 

//  write  this  tree’s  portion  of  the  compression  header 
Bitstring  leaf sPerLevel  =  new  Bitstring ( (short) 5, 
leaf SAtLevelList [0] . length () ) ; 
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Bitstring  numLevels  =  new  Bitstring ( (short ) 5 , 
leaf sAtLevelList . length) ; 

targetFile . writeBitString (leaf sPerLevel) ; 
targetFile . writeBitString (numLevels) ; 
targetFile .writeBitString (leaf sAtLevelList) ; 
targetFile .writeBitString (f reqOrderedBitStringList ) / 

//  build  frequency  ordered  list  of  Huffman  codes 
Bitstring []  f reqOrderedHuf fmanCodes  = 

VonNoymanNode .getHuf fmanCodes (leaf sAtLevelList) ; 

//  reuse  the  trashed  HashMap  as  a  (class  index  --->  Huffman 
code)  encoding  map 

HashMap  bitStringToHuf fmanCodeMap  =  freqTable; 
for  (int  j  =  0;  j  <  f reqOrderedBitStringList . length;  j++)  { 

bitStringToHuf fmanCodeMap .put ( 
f reqOrderedBitStringList [j] , 
f reqOrderedHuf fmanCodes [j]  )  ; 

} 

} 

} 


package  thesis . conpression.huf fman2 ; 

import  j  ava . ut i I . • ; 
import  j  ava . lo . • ; 

import  thesis  .  corp  res s  ion  .mult i__tree  .  *  ; 
import  thesis . compress  ion .huffman. *; 

j  -kie 

*  This  class  is  identical  to  Huf fmanCompressionManager  except  that  it 
uses  a  slightly 

*  more  efficient  header  encoding  format.  The  performance  difference 
between  the  two 

*  techniques  is  t'.'pically  insignificant  (<0.1%). 

V 

public  class  Huf f nan Compress ionManager2  extends 
Huf fmanCompress lonManager  { 

public  static  void  main (String []  args)  { 

String  usage  «  new  String ( "USAGE :  java 
Huf fmanCompressionManager2  "  + 

"<sourceFilename>  <targetFilename>  <n>  <offset>") ; 


try  { 

if  (args. length  1=  4)  { 

throw  new  ArrayIndexOutOf BoundsException ( ) ; 

} 

int  n  =  Integer .parseint (args [2] ) ; 
int  offset  =  Integer .parseint (args [3] ) ; 

BitStringReader  sourceFile  =  new  BitStringReader (args [0] , 

BitStringWriter  targetFile  =  new  BitStringWriter (args [1] ) ; 
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Huf ftnanCompressionManager2  hcm2  = 

new  Huf fmanCompressionManager2 (sourceFile,  targetFile, 

n,  offset) ; 

hcm2 . compress ( ) ; 

}  catch  (NumberFormatException  arg30r4NotInt)  { 

System. out .println (usage) ; 

}  catch  ( Array IndexOutOfBoundsExcept ion  wrongNumArgs)  { 

System. out .println (usage) ; 

}  catch  (LengthOutOfBoundsException  paramsOutOf Range)  { 

System. out .println ( "1  <  n  <=  ”  +  BitString.MAXLEN  +  ”  AND 
0  <=  offset  <  n"); 

}  catch  (Exception  e)  { 

e.printStackTrace (new  PrintStream (System. out) )  ; 

}  finally  { 

System. exit (1) ; 

} 

} 

public  Huf fmanCompressionManager2 (BitStringReader  sourceFile, 

Bits tringWri ter  targetFile, 
int  n, 

int  offsetLen) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

super (sourceFile,  targetFile,  n,  offsetLen); 

} 

protected  void  processTree (/*  IN/OUT  */  HashMap  freqTable, 

/*  OUT  */  BitStringWriter 

targetFile) 

throws 

lOException 

{ 

//  if  the  freq  table  is  empty  write  minimal  header  info  and 

return 

if  (freqTable.  isEmptyO  )  { 

targetFile .writeBitString (new  Bitstring ( (short) 5 ,  0))  ; 
return; 

}■ 

//  order  all  the  VonNoymanNodes  of  this  table  based  on 
frequency 

TreeSet  f reqOrderedVNN  =  new  TreeSet (freqTable .values ()) ; 

//  build  a  frequency  ordered  jWord  list  for  the  compression 

header 

Bitstring []  f reqOrderedJWordList  =  new 
Bitstring [freqOrderedVNN. size  0 ] ; 

Iterator  it  =  freqOrderedVNN. iterator () ; 
int  i  =  0 ; 

while  (it .hasNext 0 )  { 

f reqOrderedJWordList [i++]  = 

( (VonNoymanNode) it .next () ) . index; 

} 
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header 


//  build  and  trim  leafs  at  level  list  for  the  compression 


//  NOTE:  the  VonNoyman  algorithm  trashes  its  TreeSet  parameter 

AND 

//  the  underlying  HashMap  upon  which  it  is  based) 

Bitstring  []  leaf sAtLevelList  = 

VonNoymanNode . vonNoymanAlgorithm (f reqOrderedVNN) ; 

Bitstring  []  trimedLeaf sAtLevelList  = 
trimEmptyLeadLevels (leaf sAtLevelList) ; 

int  firstLevel  =  leaf sAtLevelList . length  - 
trimedLeaf SAtLevelList . length; 

//  write  this  tree’s  portion  of  the  compression  header 
Bitstring  numNonEmptyLevels  =  new  Bitstring ( (short ) 5 , 
trimedLeaf SAtLevelList . length) ; 

Bitstring  f irstNonEmptyLevel  =  new  Bitstring ( (short) 5 , 
firstLevel) ; 

Bitstring  leaf sPerLevel  =  new  Bitstring ( (short) 5 , 
trimedLeaf  SAtLevelList  [0]  .lengthO)  ; 

targetFile . writeBitString (numNonEmptyLevels) ; 
targetFile . writeBitString (f irstNonEmptyLevel) ; 
targetFile .writeBitString (leaf sPerLevel) ; 
targetFile .writeBitString (trimedLeaf sAtLevelList ) ; 
targetFile .writeBitString (freqOrderedJWordList) ; 

//  build  frequency  ordered  list  of  Huffman  codes 
Bitstring  []  f reqOrderedHuf fmanCodes  = 

VonNoymanNode .getHuffmanCodes (leaf sAtLevelList ) ; 

Assertion . assert ( freqOrderedJWordList . length  == 
f reqOrderedHuf fmanCodes . length) ; 

//  reuse  the  trashed  HashMap  as  a  (j-word  -->  Huffman  code) 
encoding  map 

HashMap  JWordToHuf fmanCodeMap  =  freqTable; 
for  (int  k  =  0;  k  <  f reqOrderedJWordList . length;  k++)  { 

JWordToHuf fmanCodeMap .put (freqOrderedJWordList [k] , 
freqOrderedHuf fmanCodes [k] ) ; 

} 

} 

private  final  Bitstring []  trimEmptyLeadLevels (Bitstring [] 
leaf sAtLevelList)  { 

int  numNonEmptyLevels  =  leaf sAtLevelList . length; 
int  i  =:  0; 

while  (leaf SAtLevelList [i] .bitPattern { )  ==  0)  { 

numNonEmptyLevels “ - ; 
i++; 

} 

Bitstring []  newLeaf sAtLevelList  =  new 
Bitstring [numNonEmptyLevels] ; 

System. arraycopy (leaf SAtLevelList ,  i ,  newLeaf sAtLevelList ,  0 , 
numNonEmptyLevels) ; 

return  newLeaf sAtLevelList ; 

} 
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} 


package  thesis . compress ion. huff man; 

import  java.util.*; 

import  java.io.*; 

import  org . omg . CORBA . IntHolder ; 

import  thesis . compression . multi_tree . * ; 

/*★ 

*  An  instance  of  this  class  decompresses  a  file  compressed  by  an 
instance  of 

*  Huf fmanCompressionManager . 

*/ 

public  class  Huf fmanDecompressionManager  { 

private  BitStringWriter  target File; 
private  BitStringReader  sourceFile; 
private  HashMap  decodingMap; 
private  int  n; 

private  Bit St ring  remainder; 
private  Bitstring  offset; 


public  static  void  main (String []  args)  { 

String  usage  =  new  String ( "USAGE :  java 
Huf fmanDecompressionManager  ”  + 

”<sourceFilename>  <targetFilename>" ) ; 


try  { 

i f  ( args . length  1-2)  { 

throw  new  ArrayIndexOutO f Bounds Except! on () ; 

} 

BitStringReader  sourceFile  =  new  BitStringReader (args [0]  )  ; 
BitStringWriter  targetFile  =  new  BitStringWriter (args [1]  )  ; 
new  Huf fmanDecompressionManager (sourceFile,  targetFile); 

}  catch  (ArrayIndexOutOfBoundsException  wrongNumArgs)  { 

System. out .print In (usage) ; 

}  catch  (Exception  e)  { 

e  .printStackTrace  (new  PrintStream  (System. out') )  ; 

}  finally  { 

System. exit (1)  ; 

} 

} 

piiblic  Huf fmanDecompressionManager (BitStringReader  sourceFile, 

BitStringWriter  targetFile) 

throws 

lOException 

{ 

this . sourceFile  =  sourceFile; 
this. targetFile  =  targetFile; 
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n  =  (sourceFile .readBitString (5) ) .bitPattern ( ) ; 
if  (n  ==  0)  { 
n  =  32; 

} 

int  lenOffset  =  (sourceFile . readBitString (5 )) .bitPattern () ; 
offset  =  sourceFile .readBitString (lenOffset) ; 
int  lenRem  =  (sourceFile. readBitString (5) ) .bitPattern () / 
remainder  =  sourceFile . readBitString (lenRem) ; 
decodingMap  =  new  HashMapO; 

decompressSourceFile (targetFile ,  sourceFile,  n,  decodingMap, 
remainder,  offset) ; 

} 


protected  void 

decompressSourceFile {/* 

OUT 

*/ 

BitStringWriter 

targetFile, 

/* 

IN 

*/ 

BitStringReader 

sourceFile, 

/* 

IN 

*/ 

int  n. 

/* 

OUT 

*/ 

HashMap  decodingMap, 

/* 

IN 

*/ 

Bitstring  remainder. 

/* 

IN 

*/ 

Bitstring  offset) 

throws 

lOException 

{ 

targetFile .writeBitString (offset) ; 

//  build  decoding  map  (Huffman  code  ==>  original  bit  string) 
//  remember  length  of  shortest  Huffman  code 
IntHolder  lengthOf ShortHufCode  =  new  IntHolderO; 
buildDecodingMap (sourceFile,  n,  decodingMap, 
lengthOf ShortHufCode) ; 

int  lenOf ShortHufCode  =  lengthOf ShortHufCode .value ; 

long  numBitStringsCompressed  = 

(sourceFile.readBitString(32) ) . bitPattern ( ) ; 

sourceFile . setDecodeLimit (numBitStringsCompressed) ; 
sourceFile . setN (n) ; 

Bitstring  bs  =  sourceFile .decodeBitString (decodingMap, 
lenOf ShortHufCode) ; 

while  (bs. length  0  >  0)  { 

targetFile .writeBitString (bs) ; 
bs  =  sourceFile . decodeBitString (decodingMap , 
lenOf ShortHufCode) ; 

} 


sourceFile . close ( ) ; 

targetFile .writeBitString (remainder) ; 
targetFile . close ( ) ; 


protected  void  buildDecodingMap (/*  IN 
sourceFile, 


/*  IN 
/*  OUT 


* !  BitStringReader 
*/  int  n, 

*/  HashMap  decodingMap, 
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/*  OUT  */  IntHolder 


lenOf Shortcode InTable) 
throws 
lOException 
{ 

//  build  leaf sAtLevelList  for  getHuf fmanCodes  method 
int  lenLevel  =  (sourceFile . readBitString (5) ) .bitPattern {) / 
int  numLevels  =  (sourceFile . readBitString (5) ) .bitPattern () ; 
Bitstring  []  leaf sAtLevelList  =  new  Bitstring [numLevels] ; 
for  (int  i  =  0;  i  <  numLevels;  i++)  { 

leaf sAtLevelList  [i]  =  sourceFile . readBitString (lenLevel) ; 

} 

/ /  getHuf  f manCode  s 

Bitstring  []  fregOrderedHuf fmanCodes  = 

VohNoymanNode. getHuf fmanCodes  (leaf sAtLevelList) ; 

//  fill  decodingMap  with  (Huffman  code  unencoded  bit 
string)  mapping 

int  numLeafs  =  fregOrderedHuf fmanCodes . length ; 
for  (int  i  =  0;  i  <  numLeafs;  i++)  { 

decodingMap .put (fregOrderedHuf fmanCodes [i]  , 
sourceFile . readBitString (n) ) ; 

} 

//  find  length  of  shortest  Huffman  code 
for  (int  i  -  0;  i  <  numLevels;  i++)  { 

if  ( leaf sAtLevelList  [i]  .bitPattern 0  1=  0)  { 

lenOfShortCodelnTable .value  =  i; 
return; 

} 

} 

} 

} 


package  thesis . compression . huf fman2 ; 

import  java  .util .  *■; 
import  j  ava . io . * ; 

import  org.omg.CORBA. IntHolder;  ' 

import  thesis . compress ion. multi_tree . * ; 
import  thesis . compression. huf fman. *; 

/** 

*  Decompresses  files  compressed  using  Huf fmanCompressionManager2 . 
*/ 

public  class  Huf fmanDecompressionManager2  extends 
Huf  f manDec omp r e  s  s ionManage r  { 

public  static  void  main (String []  args)  { 
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string  usage  =  new  String ( "USAGE :  java 
HuffmanDe compress ionManager2  "  + 

"<sourceFilename>  <targetFilename>" ) ; 

try  { 

if  (args. length  1=  2)  { 

throw  new  ArrayIndexOutOfBoundsException ( ) ; 

} 

BitStringReader  sourceFile  =  new  BitStringReader (args  [0] ) ; 
BitStringWriter  targetFile  =  new  BitStringWriter (args  [1] ) ; 
new  HuffmanDecompressionManager2 (sourceFile,  targetFile) ; 

}  catch  (ArrayIndexOutOfBoundsException  wrongNumArgs )  { 

System. out . println (usage) ; 

}  catch  (Exception  e)  { 

e.printStackTrace (new  PrintStream (System. out) ) ; 

}  finally  { 

System. exit (1) ; 

} 

} 

public  HuffmanDecompressionManager2 (BitStringReader  sourceFile, 

BitStringWriter  targetFile) 

throws 

lOException 

{ 

super (sourceFile,  targetFile); 

} 

protected  void  buildDecodingMap (/*  IN  */  BitStringReader 
sourceFile, 

/*  IN  */  int  n, 

/*  OUT  */  HashMap  decodingMap, 

/*  OUT  IntHolder 

lenOf Short CodeInTable) 
throws 
lOException 
{ 

int  numNonEmptyLevels  = 

(sourceFile . readBitString (5) ) .bitPattern () ; 
if  (numNonEmptyLevels  ==  0)  { 

return; 

} 

//  build  leaf sAtLevelList  for  getHuf fmanCodes  method 
int  f irstNonEmptyLevel  = 

(sourceFile . readBitString (5) ) .bitPattern ( ) ; 

int  lenLevel  =  (sourceFile . readBitString (5) ) .bitPattern () ; 
int  numLevels  =  numNonEmptyLevels  +  f irstNonEmptyLevel ; 
Bitstring []  leaf sAtLevelList  =  new  Bitstring [numLevels] ; 
for  (int  i  =  0;  i  <  numLevels;  i++)  { 

if  (i  <  f irstNonEmptyLevel)  { 

leaf sAtLevelList [i]  =  new  Bitstring ( (short ) lenLevel , 

0)  ; 

}  else  { 

leaf SAtLevelList [i]  = 
sourceFile. readBitString (lenLevel) ; 

} 
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} 


//  getHuf fmanCodes 

Bitstring []  f reqOrde re dHuf fmanCodes  = 

VonNoymanNode. getHuf fmanCodes (leaf sAtLevelList) ; 

//  fill  decodingMap  with  (Huffman  code  -->  unencoded  bit 
string)  mapping 

int  numLeafs  =  freqOrderedHuf fmanCodes . length ; 
for  (int  i  =  0;  i  <  numLeafs;  i++)  { 

decodingMap. put (freqOrderedHuf fmanCodes [i]  , 
sourceFile .readBitString (n) ) ; 

} 

//  find  length  of  shortest  Huffman  code 
for  (int  i  =  0;  i  <  numLevels;  i++)  { 

if  (leafsAtLevelList  [i]  .bitPatternO  !=  0)  { 
lenOfShortCodelnTable .value  =  i; 
return; 

} 

} 

} 


} 


package  thesis . compre ss ion. mult i_tree; 


public  class  LengthOutOfBoundsException  extends  Runt imeExcept ion  { 
public  LengthOutOfBoundsException ( )  { 

super ("Bitstring  length  must  be  between  0  and  "  + 

Integer . toString (Bitstring. MAXLEN)  +  "  (inclusive)  .")  ; 

} 

public  LengthOutOfBoundsException (String  str)  { 
super (str) ; 

} 

} 


package  thesis . compression. lyndon; 

import  java.util.*; 
import  j  ava . io . * ; 

import  thesis . compression . multi_tree . * ; 
/** 
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*  Preliminary  investigation  into  the  use  of  Lyndon  words  to  build  a 
dictionary 

*  compression  technique.  Not  used  in  final  thesis.  Looks  promising 
enough  to  continue 

*  researching. 

*/ 

public  class  LyndonCompressionManager  { 

ArrayList  dictionary; 
int[]  freq; 

BitStringReader  sourceFile; 

BitStringWriter  targetFile; 
int  nBound; 

Bitstring  remainder; 

//  Bitstring  offset; 


public  static  void  main (String []  args)  { 

String  usage  =  new  String ("USAGE:  java 
Huf fmanCompressionManager  "  + 

''<sourceFilename>  <targetFilename>  <nBound>")  ; 


try  { 

if  (args. length  !=  3)  { 

throw  new  ArrayIndexOutOfBoundsException ( ) ; 

} 

int  nBound  =  Integer .parseint  (args  [2] ) ; 

BitStringReader  sourceFile  =  new  BitStringReader (args [0] ) ; 
BitStringWriter  targetFile  =  new  BitStringWriter (args [1] ) ; 
LyndonCompressionManager  1cm  = 

new  LyndonCompressionManager (sourceFile,  targetFile, 

nBound) ; 

1cm. compress  0 ; 

}  catch  (Number Format Except ion  argSNotInt)  { 

System. out .print In (usage) ; 

}  catch  (ArrayIndexOutOfBoundsException  wrongNumArgs )  { 

System. out .print In (usage) ; 

}  catch  (LengthOutOfBoundsException  paramsOutOf Range)  { 

System. out .println ( "1  <  nBound  <=  "  +  BitString.MAXLEN) ; 

}  catch  (Exception  e)  { 

e .printStackTrace (new  PrintStream (System . out ) ) ; 

}  finally  { 

System. exit (1) ; 

} 

} 

public  LyndonCompressionManager (BitStringReader  sourceFile, 

BitStringWriter  targetFile, 
int  nBound) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

//  bounds  check 

if  (nBound  <  2  \\  nBound  >  BitString.MAXLEN)  { 
throw  new  LengthOutOfBoundsException {) ; 
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} 


//  initialize  instance  variables 
this.nBound  =  nBound/ 
this . sourceFile  =  sourceFile; 
this . targetFile  =  targetFile; 
remainder  =  new  Bitstring (); 
dictionary  =  new  ArrayListO; 
freq  =  new  int[100] ; 

Arrays . fill (freq,  1); 

} 

public  void  compress {) 
throws 
lOException 
{ 

loadDict ionary (sourceFile,  dictionary,  nBound,  remainder) ; 
//  compressSourceFile (targetFile,  sourceFile,  freqTable,  n, 
remainder,  offset) ; 

} 

public  void  loadDictionary (BitStringReader  sourceFile, 

ArrayList  dictionary, 
int  liBound, 

Bitstring  remainder) 

throws 

lOException 

{ 

int  i  ; 

Bitstring  lynWordl  =  null; 

Bitstring  lvTiWord2  =  sourceFile . readLyndonWord (nBound) ; 
while  < lynWord2 . length  0  !=  0)  { 

1  «  dictionary.indexOfdynWordl); 

1  f  ( 1  .  •  - 1 )  { 

dic!:ionary.add(lynWordl)  ; 

}  else  ' 

i  reg  [  i  ] 

1 yn  Word:  *  1 yn Word2 ; 

lynwordl  «  sourceFile . readLyndonWord (nBound) ; 

} 

remainder . relnit ( lynWordl) ; 
printDict ionarv ( ) ; 

} 

private  void  printDictionary ()  { 
int  numBits  =  0; 
int  size  -  dictionary . size () ; 

int  logSize  =  (int) Math. ceil (Math. log (size)  /  Math. log (2) ) ; 
Bitstring  entry; 

for  (int  i  =  1;  i  <  size;  i++)  { 

entry  =  (BitString)dictionary.get(i); 

System. out  .print  (i  +  ".  "  +  (i  <  10  ?  ”  "  :  *"*)); 

System. out  .print  (entry)  ; 

printSpaces (nBound  -  entry . length ( )  +3); 

System. out. println(freq [i] ) ; 
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numBits  +=  (logSize  *  freq[i]); 


} 


} 

System. out .print In ( "remainder :  " 
System . out . print In ( " source  bits : 
System. out .println ( "target  bits : 


+  remainder) ; 

”  +  sourceFile .bitsRead { ) 
"  +  numBits) ; 


private  void  printSpaces (int  num)  { 
for  (int  i  =  0;  i  <  num;  i+-f-)  { 

System . out . print ( "  " ) / 

} 

} 


MULTITREE  COMPRESSION  FORMAT 


n  5 

bits 

bits  used  Bf  to  represent  #  leafs  f  at  each  level  of  Huffman  tree  5 
bits 

bits  used  Bv  to  represent  #  of  levels  v  in  Huffman  tree  5 

bits 


leafs 

in 

level 

0 

of 

Huffman 

tree 

Bf 

bits 

leafs 

in 

level 

1 

of 

Huffman 

tree 

Bf 

bits 


leafs  in  level  v  -  1  of  Huffman  tree  Bf 

bits 


package  thesis . compression .mult i_tree ; 

import  java. util.*; 
import  java.io.*; 

/** 

*  This  class  implements  the  RTA  approach  presented  in  the  thesis. 
*/ 

public  class  MultitreeCompressionManager  implements  StatsCollected  { 

protected  Necklace  neck; 
protected  HashMap[]  hashes; 
protected  BitStringReader  sourceFile; 
protected  BitStringWriter  targetFile; 
protected  Bitstring  remainder; 
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protected  Bitstring  offset; 
protected  boolean  lastTreelnspectable; 
protected  long  headerSize; 

public  static  void  main (String []  args)  { 

String  usage  =  new  String ( "USAGE :  java 
MulitreeCompressionManager  "  + 

"<sourceFilename>  <targetFilename>  <n>  <offset>") ; 


try  { 

if  (args. length  1=  4)  { 

throw  new  ArraylndexOutOfBoxindsException  ( )  ; 

} 

int  n  =  Integer .parseint (args [2] ) ; 
int  offset  =  Integer .parseint (args [3] ) ; 

BitStringReader  sourceFile  =  new  BitStringReader (args [0]  , 

n)  ; 

BitStringWriter  targetFile  =  new  BitStringWriter(args[l]); 
MultitreeCompressionManager  mcm  = 

new  MultitreeCompressionManager (sourceFile,  targetFile, 

n,  offset) ; 

mcm. compress  0  ; 

}  catch  (Number FormatExcept ion  arg30r4NotInt)  { 

System. out .println (usage) ; 

}  catch  (ArrayIndexOutOfBoundsException  wrongNumArgs )  { 

System. out .println (usage) ; 

}  catch  (LengthOutOfBoundsException  paramsOutOf Range)  { 

System. out .println (” 1  <  n  <=  "  +  Bitstring .MAXLEN  +  "  AND 
0  <=  offset  <  n"); 

}  catch  (Exception  e)  { 

e .printStackTrace (new  Print Stream (System. out)  )  ; 

}  finally  { 

System. exit (1) ; 

} 

} 

public  MultitreeCompressionManager (BitStringReader  sourceFile, 

BitStringWriter  targetFile, 
int  n, 

int  offsetLen) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

//  bounds  check 

if  (n  <  2  I  I  n  >  Bitstring. MAXLEN  | |  offsetLen  <  0  |  [  offsetLen 

>=  n)  { 

throw  new  LengthOutOfBoundsException () ; 

} 

//  initialize  instance  variables 
neck  =  new  Necklace (n) ; 
this . sourceFile  =  sourceFile; 
this. targetFile  =  targetFile; 
sourceFile . setN (n) ; 

offset  =  sourceFile. readBitString (offsetLen) ; 
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remainder  =  new  Bitstring (); 
lastTreelnspectable  =  false; 
headerSize  =  -1; 
hashes  =  new  HashMap [n  +  1] ; 
for  (int  i  =  0;  i  <=  n;  i++)  { 
hashes[i]  =  new  HashMapO; 

} 

} 

public  void  compress () 
throws 
lOException 
{ 

buildFregTables {)  ; 
compressSourceFile ( ) ; 

} 


II  ***★★*★★★★★  methods  required  to  implement  StatsCollected 
interface  *****♦*♦*♦♦ 

public  int  iBitsO  {return  neck. rotLength () / } 
public  int  jBitsO  {return  neck . indexLength ( ) ; } 
public  int  iTreeO  {return  neck.nO;} 
public  int  n  (  •  {return  neck.nO;} 

public  int  lenOffsetO  {return  of  f  set .  length  ();  } 

public  String  sourceFilename ( )  {return  sourceFile . filename {); } 

public  long  scurceFi leSize ( ) 
throws 

StatsNotAva 1 1  at ieException 

{ 

long  Efr  «  sourceFile .bitsRead {) ; 
if  (sfc  >  of  f  set .  lengthO  )  { 

return  sts; 

}  else  • 

thro-,  new  StatsNotAvailableException  ( )  ; 

} 

} 

public  long  t arget Fi leSize  () 
throws 

StatsNotAva 1 1  at  1 eExcept ion 

{ 

long  tfs  =  targetFile .bitsWrote 0 ; 
if  (tfs  >  0)  { 

return  tfs; 

}  else  { 

throw  new  StatsNotAvailableException () ; 

} 

} 

public  Iterator  lastTreelnspector () 
throws 

StatsNotAvailableException 

{ 

if  (lastTreelnspectable)  { 
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return  hashes [neck.n () ] .values {) . iterator (); 

}  else  { 

throw  new  StatsNotAvailableException () ; 

} 

} 

public  long  headerSizeO 
throws 

StatsNotAvailableException 

{ 

if  (headerSize  ==  -1)  { 

throw  new  StatsNotAvailableException () ; 

}  else  { 

return  headerSize; 

} 

} 

public  int  []  treeSizes ( ) 
throws 

StatsNotAvailableException 

{ 

int  n  =  neck.n 0 ; 
if  (hashes [n] . isEmpty ( ) )  { 

throw  new  StatsNotAvailableException () ; 

}  else  { 

int[]  numLeafs  =  new  int  [n]  ; 
for  (int  i  =  0;  i  <  n;  i++)  { 

numLeafs [i]  =  hashes [i] . isEmpty ( )  ?  0  : 
hashes  [i]  . size ( ) ; 

} 

return  numLeafs; 

} 

} 

public  void  buildFreqTables () 
throws 
lOException 
{ 

buildFreqTables (sourceFile,  neck,  hashes,  remainder) ; 
lastTreelnspectable  =  true; 

} 

public  void  compressSourceFile ( ) 
throws 
lOException 
{ 

lastTreelnspectable  =  false; 

compressSourceFile (targetFile,  sourceFile,  neck,  hashes 
remainder,  offset) ; 

} 

I j  *★★★★**★******  end  of  methods  for  StatsCollected  interfa 


protected  void  buildFreqTables (  /*  IN  */  BitStringReader 
sourceFile, 
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/*  IN  */  Necklace  neck, 

/*  OUT  */  HashMap[]  freqTables, 

/★  OUT  */  Bitstring  remainder) 

throws 

lOException 

{ 

short  indexLen  =  {short ) neck . indexLength () ; 
short  rotLen  =  (short) neck. rotLength () ; 
int  n  =  neck. n  0 ; 

Bitstring  bs  =  sourceFile . readBitString () ; 

//  prescan  entire  source  file  and  build  n  +  1  freqency  tables 
while  (bs.  length  0  :==  n)  { 

int  bitPattern  =  bs .bitPattern () ; 

int  classindex  =  neck .bitStringToIndex (bitPattern) ; 
int  rot  =  neck. bitStringToRots (bitPattern); 


//  reuse  Bitstring  object  to  represent  class  index  then 
//  throw  the  class  index  into  f reqTable  [0 . . .n  -  1] 
bs.relnit (indexLen,  classindex) ; 
if  (freqTables [rot] . containsKey (bs) )  { 

(  (VonNoymanNode)  (freqTables  [rot]  .get  (bs)  )  )  .incrFreqO  ; 
}  else  { 

freqTables [rot] .put (bs,  new  VonNoymanNode (bs) ) ; 

} 

//  throw  rotation  into  f reqTable [n] 

Bitstring  r  =  new  Bitstring (rotLen,  rot) ; 
if  (freqTables [n] . containsKey (r) )  { 

(  (VonNoymanNode)  (freqTables  [n]  .get  (r)  )  )  .incrFreqO  ; 

}  else  { 

freqTables [n] .put (r,  new  VonNoymanNode (r) ) ; 

} 


bs  =  sourceFile.readBitString 0 ; 

} 

remainder . relnit (bs) ; 

} 

protected  void  compressSourceFile (/*  OUT  */  BitStringWriter 

targetFile, 

/★  IN  */  BitStringReader 

sourceFile, 

/*  IN  */  Necklace  neck, 

/*  IN/OUT  */  HashMap[] 

freqTables , 

/*  IN  */  Bitstring  remainder, 

/*  IN  */  Bitstring  offset) 

throws 

lOException 

{ 

int  n  =  neck.n {) ; 

//  write  compression  header 

writeCompressionHeader (targetFile,  sourceFile,  n,  freqTables , 
remainder,  offset) ; 
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HashMapE]  encodingMap  =  freqTables; 
sourceFile . reset ( ) ; 

sourceFile.readBitString(offset.length() ) ;  //  throw  away 

offset 


//  write  sourceFile  to  targetFile  in  compressed  form 
int  indexLen  =  neck . indexLength ( ) ; 
int  rotLen  =  neck . rotLength ( ) ; 

Bitstring  bsl  =  sourceFile . readBitString () ; 

Bitstring  bs2  =  new  Bitstring () ; 

while  (bsl . length  0  ==  n)  { 

int  bitPattern  =  bsl .bitPatternO ; 

int  classindex  =  neck.bitStringToIndex (bitPattern) ; 
int  rot  =  neck. bitStringToRots (bitPattern); 


//  write  Huffman  code  for  rotations  to  targetFile 
//  recycling  Bitstring  objects  to  avoid  unnecessary  object 

creation 

bsl . relnit (rotLen,  rot) ; 

bs2 . relnit ( (Bitstring) encodingMap [n] .get (bsl) ) ; 
targetFile . writeBitString (bs2 ) ; 


//  write  Huffman  code  for  class  index  to  targetFile 

//  recycling  Bitstring  objects 

bsl . relnit (indexLen,  classindex) ; 

bs2. relnit ( (Bitstring) encodingMap [rot] .get (bsl) ) ; 

targetFile . writeBitString (bs2 ) ; 


bsl  =  sourceFile. readBitString 0 ; 

} 

Assertion. assert (remainder . equals (bsl) ) ; 
targetFile . close ( ) ; 


protected  void  writeCompressionHeader 
targetFile, 


(/* 

/* 


sourceFile, 


/* 

/* 


OUT  */  BitStringWriter 

IN  */  BitStringReader 

IN  */  int  n, 

IN/OUT  */  HashMapE] 


freqTables, 

/*  IN 

remainder, 

/*  IN 

offset) 

throws 

lOException 

{ 

//  first  entries  in  header  of  target  file 
targetFile. writeBitString (new  Bitstring (5, 
targetFile .writeBitString (new  Bitstring (5, 
targetFile .writeBitString (offset) ; 
targetFile .writeBitString (new  Bitstring (5 , 
remainder . length ( ) ) ) ; 

targetFile. writeBitString (remainder) ; 


*/  Bitstring 
*/  Bitstring 


n)  )  ; 

offset. length  0 ) ) ; 
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//  convert  each  HashMap  from  a  f regency  table  to  an  encoding 
map  and  write 

//  header  info  for  each  Huffman  tree 
for  (int  i  =  0;  i  <=  n;  i++)  { 

processTree (f reqTables [i] ,  targetFile) ; 

} 

//  write  numBitStringsRead  as  last  entry  in  header 
long  numBitStringsRead  =  (sourceFile . bitsRead ( )  - 
of f set . length  0 )  /  n; 

Assertion . assert (numBitStringsRead  <=  Integer.MAX_VALUE) ; 
targetFile .writeBitString (new  Bitstring (32 , 

(int ) numBitStringsRead) ) ; 

headerSize  =  targetFile .bit sWrote ( ) ; 

} 

protected  void  processTree (/*  IN/OUT  */  HashMap  freqTable, 

/*  OUT  */  BitStringWriter 

targetFile) 

throws 

lOException 

{ 

//  if  the  freq  table  is  empty  write  minimal  header  info  and 

return 

if  (freqTable . isEmpty 0 )  { 

targetFile .writeBitString (new  Bitstring (10 ,  0)  )  ; 
return; 

} 

//  order  all  the  VonNoymanNodes  of  this  table  based  on 
frequency 

TreeSet  f reqOrderedVNN  =  new  TreeSet (freqTable .values ()) ; 

//  build  a  frequency  ordered  class  index  list  for  the 
compression  header 

Bitstring  []  f reqOrderedClassIndexList  =  new 
Bitstring [f reqOrderedVNN. size () ] ; 

Iterator  it  =  f reqOrderedVNN. iterator () ; 
int  i  =  0 ; 

while  (it .hasNext 0 )  { 

f reqOrderedClassIndexList [i++]  = 

( ( VonNoymanNode ) i t . next ( ) ) . index ; 

} 

//  build  leafs  at  level  list  for  the  compression  header 
//  NOTE:  this  algorithm  trashes  its  TreeSet  parameter  AND  the 
underlying 

//  HashMap  upon  which  it  is  based) 

Bitstring  []  leaf sAtLevelList  = 

VonNoymanNode . vonNoymanAlgorithm (f reqOrderedVNN)  ; 

//  write  this  tree's  portion  of  the  compression  header 
Bitstring  leaf sPerLevel  =  new  Bitstring (5, 
leaf sAtLevelList [0] . length () ); 

Bitstring  numLevels  =  new  Bitstring (5, 
leaf sAtLevelList . length) ; 
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targetFile . writeBitString (leaf sPerLevel) ; 
targetFile .writeBit String (numLevels) ; 
targetFile .writeBitString (leaf sAtLevelList) ; 
targetFile .writeBitString (freqOrderedClassIndexList) ; 

//  build  frequency  ordered  list  of  Huffman  codes 
Bitstring  []  freqOrderedHuf fmanCodes  = 

VonNoymanJSrode  .getHuffmanCodes  (leaf sAtLevelList)  ; 

//  reuse  the  trashed  HashMap  as  a  (class  index  Huffman 
code)  encoding  map 

HashMap  class IndexToHuffmanCodeMap  =  freqTable; 
for  (int  j  =  0;  j  <  f reqOrderedClassIndexList . length;  j++)  { 

class IndexToHuffmanCodeMap .put ( 
freqOrderedClassIndexList [j ] , 
freqOrderedHuf fmanCodes [j] ) ; 

} 

} 

} 


package  thesis . compression. mult i_tree2 ; 

import  java.util.*; 
import  java.io.*; 

import  thesis . compression. multi_tree . * ; 

/  ★* 

*  This  class  implements  the  ITA  approach  discussed  in  the  thesis. 

*/ 

public  class  MultitreeCompressionManager2  implements  StatsCollected  { 

protected  HashMap []  hashes; 
protected  BitStringReader  sourceFile; 
protected  BitStringWriter  targetFile; 
protected  int  iBits; 
protected  int  jBits; 
protected  Bitstring  remainder; 
protected  Bitstring  offset; 
protected  int  iTree; 

protected  boolean  lastTreelnspectable; 
protected  long  headerSize; 

public  static  void  main (String []  args)  { 

String  usage  =  "USAGE:  java  MulitreeCompressionManager2  "  + 
"<sourceFilename>  <targetFilename>  <iBits>  <jBits>  <offset>"; 

try  { 

if  (args. length  !=  5)  { 

throw  new  ArrayIndexOutOfBoundsException () ; 

} 

int  iBits  =  Integer .parseint (args [2] ) ; 
int  jBits  =  Integer .parseint (args [3] ) ; 
int  offset  =  Integer .parseint (args [4] ) ; 
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BitStringReader  sourceFile  =  new  BitStringReader (args  [0] ) / 
BitStringWriter  targetFile  =  new  BitStringWriter (args  [1] ) ; 
MultitreeCompressionManager2  mcm2  =  new 
MultitreeCompressionManager2 ( 

sourceFile,  targetFile,  iBits,  jBits,  offset); 
mcm2 . compress ( ) ; 

}  catch  (NumberFormatException  arg2or3or4NotInt)  { 

System. out  .println  (usage)  ,* 

}  catch  (Array IndexOutOfBoundsException  wrongNumArgs )  { 

System. out .println (usage) ; 

}  catch  (LengthOutOfBoundsException  paramsOutOf Range)  { 

System. out .println ( "1  <  iBits  +  jBits  <=  "  + 
BitString.MAXLEN  +  "  AND  0  <=  offset  <  iBits  +  jBits”); 

}  catch  '  (Exception  e)  { 

e .printStackTrace (new  PrintStream (System. out ) ) ; 

}  finally  { 

System. exit (1) ; 

} 

} 

public  MultitreeCompressionManager2 (BitStringReader  sourceFile, 

BitStringWriter  targetFile, 
int  iBits, 
int  jBits, 
int  offsetLen) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

//  bounds  check 

if  (iBits  <  1  I  I  jBits  <  1  |  |  iBits  +  jBits  >  BitString.MAXLEN 

II 

offsetLen  <  0  | |  offsetLen  >=  iBits  +  jBits)  { 
throw  new  LengthOutOf BoundsException ( ) ; 

} 

//  initialize  instance  variables 
this . sourceFile  =  sourceFile; 
this . targetFile  =  targetFile; 
this. iBits  =  iBits; 
this. jBits  =  jBits; 

offset  =  sourceFile . readBitString (of fsetLen) ; 
remainder  =  new  Bitstring (); 
iTree  =  (int) (Math.pow(2,  iBits)); 
lastTreelnspectable  =  false; 
headerSize  =  -1; 

hashes  =  new  HashMap [iTree  +  1] ; 
for  (int  i  =  0;  i  <=  iTree;  i++)  { 

hashes [i]  =  new  HashMap (); 

} 

} 

II  ***********  methods  required  to  implement  StatsCollected 
interface  *********** 

public  int  iBits ()  {return  iBits;} 
public  int  jBits  ()  {return  jBits,-} 
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public  int  n()  {return  iBits  +  jBits;} 

public  int  iTreeO  (return  iTree;} 

public  int  lenOffsetO  (return  of  f  set .  length  (); } 

public  String  sourceFilename ( )  (return  sourceFile . filename 

public  long  sourceFileSize ( ) 
throws 

StatsNotAvailableException 

{ 

long  sfs  =  sourceFile .bitsRead 0 ; 
if  (sfs  >  offset .length 0 )  { 
return  sfs; 

}  else  { 

throw  new  StatsNotAvailableException {) ; 

} 

} 

public  long  headerSizeO 
throws 

StatsNotAvailableException 

{ 

if  (headerSize  ==  -1)  { 

throw  new  StatsNotAvailableException () ; 

}  else  { 

return  headerSize; 

} 

} 


public  long  targetFileSize () 
throws 

StatsNotAvailableException 

{. 

long  tfs  =  targetFile.bitsWrote 0 ; 
if  (tfs  >  0)  { 
return  tfs; 

}  else  { 

throw  new  StatsNotAvailableException () ; 

} 

} 

public  Iterator  lastTreelnspector () 
throws 

S  t  a t  sNo t Ava i 1 ab 1 eExc ep t i on 

{ 

if  (lastTreelnspectable)  { 

return  hashes  [iTree]  .valuesO  . iterator  ()  ; 
}  else  { 

throw  new  StatsNotAvailableException () ; 

} 

} 

public  int[]  treeSizesO 
throws 

StatsNotAvailableException 

{ 

if  (hashes  [iTree]  .  isEmptyO  )  { 
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throw  new  StatsNotAvailableException ( ) ; 

}  else  { 

int []  numLeaf s  =  new  int [iTree] ; 
for  (int  i  =  0;  i  <  iTree;  i++)  { 

numLeaf s [i]  =  hashes [i] . isEmpty ( )  ?  0  : 
hashes  [i]  . size ( )  ; 

} 

return  numLeaf s; 

} 

} 

public  void  buildFreqTables { )  throws  lOException  { 

buildFreqTables (sourceFile,  hashes,  iBits,  jBits,  iTree, 
remainder) ; 

lastTreelnspectable  =  true; 

} 

public  void  compressSourceFile ( )  throws  lOException  { 
lastTreelnspectable  =  false; 

compressSourceFile (targetFile,  sourceFile,  hashes ,  iBits , 
jBits,  iTree,  remainder,  offset) ; 

} 

II  **************  end  of  methods  for  StatsCollected  interface 


public  void  compress () 

throws 

lOException 

{ 

buildFreqTables 0 ; 
compressSourceFile ( ) ; 


protected  void  buildFreqTables  ( 
sourceFile, 


/* 

IN 

*/ 

/* 

OUT 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

/* 

OUT 

*/ 

BitStringReader 

HashMap  []  freqTables, 
int  iBits, 
int  jBits, 
int  iTree, 

Bitstring  remainder) 


throws 

lOException 

{ 

int  lenWord  =  iBits  +  jBits; 

Bitstring  peekstring  =  sourceFile .peekBitString(lenWord); 


//  prescan  entire  source  file  and  build  iTree  +  1  f regency 

tables 

while  {peekstring. length 0  ==  lenWord)  { 

Bitstring  iWord  =  sourceFile . readBitString (iBits) ; 
Bitstring  jWord  =  sourceFile . readBitString (jBits) ; 

if  (freqTables [iWord.bitPattern 0 ] . containsKey ( jWord) )  { 
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(  (VonNoymanNode)  (freqTables  [iWord.bitPattern  ()  ]  .get  (j Word) )  )  .incrFreqO 


}  else  ( 

freqTables [iWord .bitPattern ( ) ] .put (j Word,  new 
VonNoymanNode  ( jWord) )  ; 

} 

//  throw  iWord  into  freqTable [iTree] 
if  (freqTables [iTree] .containsKey (iWord) )  { 

(  (VonNoymanNode)  (freqTables  [iTree]  .get  (iWord) )  )  .incrFreqO  ; 

}  else  { 

freqTables [iTree] -put (iWord,  new  VonNoymanNode (iWor4) ) ; 

} 


peekstring  =  sourceFile .peekBitString (lenWord) ; 

} 

remainder. relnit (peekstring) ; 

sourceFile.readBitString (peekstring, length 0 ) ;  //  read  and 

throw  away  remainder 


//  to  keep 


bitsRead ( )  accurate 

} 


protected  void  compressSourceFile (/* 

OUT 

*/ 

BitStringWriter 

targetFile, 

/* 

IN 

*/ 

BitStringReader 

sourceFile, 

/* 

IN/OUT 

*/ 

HashMap  [] 

freqTables , 

/* 

IN 

*/ 

int  iBits, 

/* 

IN 

*/ 

int  jBits, 

/* 

IN 

*/ 

int  iTree , 

/* 

IN 

*/ 

Bitstring  remainder. 

/* 

IN 

*/ 

Bitstring  offset) 

throws 

lOException 

{ 

writeCompressionHeader ( targetFile ,  sourceFile ,  iBits ,  j  Bits , 
iTree,  freqTables,  remainder,  offset) ; 


HashMap[]  encodingMap  =  freqTables; 
structure 

sourceFile . reset ( ) ; 


AGAIN 

sourceFile . readBitString (offset . length ( ) ) ; 

offset 


//  reusing  data 
//  read  the  file 
//  throw  away 


int  lenWord  =  iBits  +  jBits; 

Bitstring  peekstring  =  sourceFile .peekBitString (lenWord) ; 
while  (peekstring . length  0  ==  lenWord)  { 

Bitstring  iWord  =  sourceFile . readBitString (iBits) ; 
Bitstring  jWord  =  sourceFile . readBitString (jBits) ; 

//  write  Huffman  code  for  iWord  to  targetFile 
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targetFile . writeBitString ( (BitString) encodingMap [iTree] .get (iWord) ) ; 

//  write  Huffman  code  for  jword  to  targetFile 

targetFile .writeBitString ( (Bitstring) encodingMap [iWord.bitPattern { ) ] .ge 
t  (jWord)  )  ; 


peekstring  =  sourceFile .peekBitString (lenWord) ; 

} 

Assertion. assert (remainder .equals (peekstring) ) ; 


targetFile . close  ( ) ; 

} 

protected  void  writeCompressionHeader (/*  OUT 

*/ 

BitStringWriter 

targetFile, 

/*  IN 

*/ 

BitStringReader 

sourceFile, 

/*  IN 

*/ 

int  iBits, 

/*  IN 

*/ 

int  jBits, 

/*  IN 

*/ 

int  iTree , 

/*  IN/OUT 

*/ 

HashMap  [] 

freqTables , 

/*  IN 

*/ 

Bitstring 

remainder. 

/*  IN 

*/ 

Bitstring 

offset) 

throws 

lOException 

{ 

//  first  entries  in  header  of  target  file 

targetFile .writeBitString (new  Bitstring ( (short) 5 ,  iBits) ) ; 
targetFile .writeBitString (new  Bitstring ( (short ) 5 ,  jBits)); 
targetFile .writeBitString (new  Bitstring ( (short) 5 , 
offset . length ( ) ) ) ; 

targetFile. writeBitString (offset) ; 
targetFile .writeBitString (new  Bitstring ( (short) 5 , 
remainder . length ( ) ) ) ; 

targetFile .writeBitString (remainder) ; 

//  convert  each  HashMap  from  a  f regency  table  to  an  encoding 
map  and  write 

//  header  info  for  each  Huffman  tree 
for  (int  i  =  0;  i  <=  iTree;  i++)  { 

processTree (freqTables [i] ,  targetFile) ; 

} 

//  write  numBitStringsRead  as  last  entry  in  header 
long  numBitStringsRead  =  (sourceFile .bitsRead ( )  - 
offset .length 0 )  /  (iBits  +  jBits); 

Assertion . assert (numBitStringsRead  <=  Integer.MAX_VALUE) ; 
targetFile. writeBitString (new  Bitstring ( (short) 32, 

(int ) numBitStringsRead)  )  ; 

headerSize  =  targetFile .bitsWrote () ; 

} 
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protected  void  processTree  (/*  IN/OXJT  */  HashMap  freqTable, 

/*  OUT  */  BitStringWriter 

targetFile) 

throws 

lOException 

{ 

//  if  the  freq  table  is  empty  write  minimal  header  info  and 

return 

if  (freqTable.  isEmptyO  )  { 

targetFile. writeBitString (new  Bitstring ( (short) 10,  0) ) ; 
return; 

} 

//  order  all  the  VonNoymanNodes  of  this  table  based  on 
frequency 

TreeSet  f reqOrderedVNN  =  new  TreeSet (freqTable .values ()) ; 

//  build  a  frequency  ordered  j -word  list  for  the  compression 

header 

Bitstring  []  freqOrderedJWordList  =  new 
Bitstring [f reqOrderedVNN. size  0 ] ; 

Iterator  it  =  freqOrderedVNN. iterator () ; 
int  i  =  0; 

while  (it .hasNext 0  )  { 

freqOrderedJWordList [i++]  = 

( (VonNoymanNode) it . next ( ) ) . index; 

} 

//  build  leafs  at  level  list  for  the  compression  header 
//  NOTE:  this  algorithm  trashes  its  TreeSet  parameter  AND  the 
underlying 

//  HashMap  upon  which  it  is  based) 

Bitstring  []  leaf sAtLevelList  = 

VonNoymanNode  .vonNoymanAlgorithm  (freqOrderedVNN)  ; 

//  write  this  tree's  portion  of  the  compression  header 
Bitstring  leaf sPerLevel  =  new  Bitstring ( (short) 5 , 
leaf SAtLevelList  [0]  . length () ) ; 

Bitstring  numLevels  =  new  Bitstring ( (short) 5 , 
leaf sAtLevelList . length) ; 

targetFile .writeBitString (leaf sPerLevel) ; 
targetFile .writeBitString (numLevels) ; 
targetFile .writeBitString (leaf sAtLevelList) ; 
targetFile .writeBitString (freqOrderedJWordList) ; 

//  build  frequency  ordered  list  of  Huffman  codes 
Bitstring  []  f reqOrderedHuf fmanCodes  = 

VonNoymanNode. getHuffmanCodes  (leaf sAtLevelList)  ; 

//  reuse  the  trashed  HashMap  as  a  (j*word  Huffman  code) 
encoding  map 

HashMap  JWordToHuf fmanCodeMap  =  freqTable; 
for  (int  k  =  0;  k  <  f reqOrderedJWordList . length;  k++)  { 

JWordToHuf fmanCodeMap. put (freqOrderedJWordList [k] , 
f reqOrderedHuf fmanCodes [k] ) ; 

} 


} 
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} 


package  thesis . compress ion. multi_tree3 ; 

import  java .util . *; 
import  j  ava . io . * ; 

import  thesis . compress ion. multi_tree . * ; 
import  thesis . compress ion. multi_tree2 . * ; 

/** 

*  This  class  is  identical  to  MultitreeCompressionManager2  (ITA) 
except  that  it  uses 

*  a  slightly  more  efficient  header  encoding  format.  The  performance 
difference  between  the  two 

*  techniques  is  typically  insignificant  (<0.1%). 

*/ 

public  class  Mult itreeCompressionManager3 
extends  Multit reeCompressionManager2 
implements  Stat sCol lected 
{ 

public  static  void  main (String []  args)  { 

String  usage  «  "USAGE:  java  MulitreeCompressionManager3  "  + 
"<sourcer..  Ier4ame>  <targetFilename>  <iBits>  <jBits>  <offset>"; 

try  ; 

if  arqs. length  1=  5)  { 

throw  new  ArrayIndexOutOf BoundsException  ( ) ; 

int  lE.ts  «=  Integer  .parseint  (args  [2]  )  ; 
int  ;;  E  :  t  s  =•  Integer  .parseint  (args  [3]  )  ; 
int  offset  =  Integer.parselnt(args[4]); 

Bi tSt r I ngPeader  sourceFile  =  new  BitStringReader(args[0]); 
Bit£:t  r  ^ngWriter  targetFile  =  new  BitStringWriter(args[l]); 
Mult  1 1  reeCorrpressionManager3  mcm3  =  new 
MultitreeCorr.presc  .onManager3  ( 

sourceFile,  targetFile,  iBits,  jBits,  offset) ; 
mcml  .  co-rpress  ( )  ; 

}  catch  (NuTr±>erFormatException  arg2or3or4NotInt )  { 

System . out .println (usage) ; 

}  catch  (ArrayIndexOutOf Bounds Except ion  wrongNumArgs )  { 

System. out .println (usage) ; 

}  catch  (LengthOutOf BoundsException  paramsOutOf Range)  { 

System. out .println("l  <  iBits  +  jBits  <=  "  + 

Bitstring .MAXLEN  +  "  AND  0  <=  offset  <  iBits  +  jBits"); 

}  catch  (Exception  e)  { 

e .printStackTrace (new  PrintSt ream (System. out) ) ; 

}  finally  { 

System. exit (1) ; 

} 

} 

public  MultitreeCompressionManagerS (BitStringReader  sourceFile, 
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BitStringWriter  targetFile, 
int  iBits, 
int  jBits, 
int  offsetLen) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

super (sourceFile,  targetFile,  iBits,  jBits,  offsetLen); 

} 

protected  void  processTree (/*  IN/OUT  */  HashMap  freqTable, 

/*  OUT  */  BitStringWriter 

targetFile) 

throws 

lOException 

{ 

//  if  the  freq  table  is  empty  write  minimal  header  info  and 

return 

if  (freqTable.isEmpty 0 )  { 

targetFile .writeBitString (new  Bitstring ( (short) 5 ,  0) ) ; 
return; 

} 

//  order  all  the  VonNoymanNodes  of  this  table  based  on 
frequency 

TreeSet  f reqOrderedVNN  =  new  TreeSet (freqTable .values ()) ; 

//  build  a  frequency  ordered  j -word  list  for  the  compression 

header 

Bitstring []  f reqOrderedJWordList  =  new 
Bitstring [f reqOrderedVNN. size  ( ) ] ; 

Iterator  it  =  freqOrdeiredVNN.  iterator  ()  ; 
int  i  =  0 ; 

while  (it .hasNext 0 )  { 

freqOrderedJWordList [i++]  = 

( (VonNoymanNode) it. next () ) .index; 

} 

//  build  and  trim  leafs  at  level  list  for  the  compression 

header 

//  NOTE:  the  VonNoyman  algorithm  trashes  its  TreeSet  parameter 

AND 

//  the  underlying  HashMap  upon  which  it  is  based) 

Bitstring  []  leaf sAtLevelList  = 

VonNoymanNode  .vonNoymanAlgor ithm  (freqOrderedVNN)  ; 

Bitstring  []  trimedLeaf sAtLevelList  = 
trimEmptyLeadLevels (leaf sAtLevelList) ; 

int  firstLevel  =  leaf sAtLevelList . length  - 
trimedLeaf sAtLevelList . length; 

//  write  this  tree's  portion  of  the  compression  header 
Bitstring  nuraNonEmptyLevels  =  new  Bitstring ( (short) 5 , 
trimedLeaf SAtLevelList . length) ; 

Bitstring  f irstNonEmptyLevel  =  new  Bitstring ( (short) 5, 
firstLevel) ; 
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Bitstring  leaf sPerLevel  =  new  Bitstring ( (short ) 5 , 
trimedLeaf sAtLevelList [0] . length () ) ; 

targetFile . writeBitString (numNonEmptyLevels)  ; 
targetFile . writeBitString (f irstNonEmptyLevel) ; 
targetFile .writeBitString (leaf sPerLevel)  ; 
targetFile .writeBitString (trimedLeaf SAtLevelList )  ; 
targetFile .writeBitString (f reqOrderedJWordList ) ; 

//  build  frequency  ordered  list  of  Huffman  codes 
Bitstring []  f reqOrderedHuf fmanCodes  = 

VonNoymanNode .getHuf fmanCodes (leaf sAtLevelList)  ; 

//  reuse  the  trashed  HashMap  as  a  (j-word  Huffman  code) 
encoding  map 

HashMap  JWordToHuf fmanCodeMap  =  freqTable; 
for  (int  k  =  0;  k  <  f reqOrderedJWordList . length;  k++)  { 
JWordToHuf  fmanCodeMap  .put  (freqOrderedJWordList  [k]  , 
f reqOrderedHuf fmanCodes [k] ) ; 

} 

} 

private  final  Bitstring  []  trimEmptyLeadLevels (Bitstring [] 
leaf SAtLevelList)  { 

int  numNonEmptyLevels  =  leaf sAtLevelList . length; 
int  i  =  0 ; 

while  (leaf sAtLevelList  [i]  .bitPattern 0  ==  0)  { 

numNonEmptyLevels - - ; 

i++; 

} 

Bitstring []  newLeaf sAtLevelList  =  new 
Bitstring [numNonEmptyLevels] ; 

System . arraycopy ( leaf sAtLevelList ,  i ,  newLeaf sAtLevelList ,  0 , 
numNonEmptyLevels) ; 

return  newLeaf sAtLevelList ; 

} 

} 


package  thesis . compression . multi_tree4 ; 

import  j  ava . ut i 1 . * ; 
import  j  ava . io . * ; 

import  thesis . compression.multi_tree . * ; 
import  thesis . compression.multi_tree3 . * ; 

/** 

*  This  class  is  identical  to  MultitreeCompressionManagerS  except  that 
it  uses  fixed 

*  length  iBit  codes  instead  of  Huffman  codes  from  an  iBit  tree.  It 
was  an  experiment 

*  to  see  how  much  of  an  effect  the  iBit  tree  was  having  on  our 
overall  compre s s ion 

*  results.  This  code  helped  us  build  our  theoretical  model  of  ITA. 
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*/ 

public  class  MultitreeCompressionManager4 
extends  MultitreeCompressionManagerS 
implements  StatsCollected 
{ 

public  static  void  main (String []  args)  { 

String  usage  =  "USAGE:  java  MulitreeCompressionManager4  "  + 
"<sourceFilename>  <targetFilename>  <iBits>  <jBits>  <offset>"; 

try  { 

if  (args. length  !=  5)  { 

throw  new  ArrayIndexOutOfBoundsException ( ) ; 

} 

int  iBits  =  Integer .parseint (args [2] ) ; 
int  jBits  =  Integer .parseint (args [3] ) ; 
int  offset  =  Integer .parseint (args [4] ) ; 

BitStringReader  sourceFile  =  new  BitStringReader (args [0] ) 
BitStringWriter  targetFile  ==  new  BitStringWriter  (args  [1]  ) 
MultitreeCompressionManager4  mcm4  =  new 
MultitreeCompressionManager4 ( 

sourceFile,  targetFile,  iBits,  jBits,  offset) ; 
mcm4 . compress ( ) ; 

}  catch  (NumberFormat Except ion  arg2or3or4NotInt)  { 
System.out.println(usage) ; 

}  catch  (ArrayIndexOutOfBoundsException  wrongNumArgs )  { 
System. out .print In (usage) ; 

}  catch  (LengthOutOfBoundsException  paramsOutOf Range)  { 
System,  out  .printlnC'l  <  iBits  +  jBits  <=  "  + 
BitString.MAXLEN  +  "  AND  0  <=  offset  <  iBits  +  jBits"); 
}  catch  (Exception  e)  { 

e .print StackTrace (new  Print St ream (System. out) ) ; 

}  finally  { 

System. exit (1) ; 

} 

} 

public  MultitreeCompressionManager4 (BitStringReader  sourceFile, 

BitStringWriter  targetFile, 
int  iBits, 
int  jBits, 
int  offsetLen) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

super (sourceFile,  targetFile,  iBits,  jBits,  offsetLen); 

} 


protected  void 
targetFile, 

sourceFile, 

freqTables, 


compressSourceFile (/*  OUT 

/*  IN 
/*  IN/OOT 


/*  IN 
/*  IN 


*/  BitStringWriter 

*/  BitStringReader 

*/  HashMap[] 

*/  int  iBits, 

*/  int  jBits, 
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/★IN  ★/  int  iTree, 

/★IN  */  Bitstring  remainder, 

/★  IN  */  Bitstring  offset) 

throws 

lOException 

{ 

writeCompressionHeader (targetFile,  sourceFile,  iBits,  jBits, 
iTree,  freqTables,  remainder,  offset) ; 


HashMapE]  encodingMap  =  freqTables; 
structure 

sourceFile . reset { )  ; 

AGAIN 


sourceFile . readBitString (offset . length ( ) ) ; 

offset 


//  reusing  data 
//  read  the  file 
/ /  throw  away 


int  lenWord  =  iBits  +  jBits; 

Bitstring  peekstring  =  sourceFile .peekBitString (lenWord) ; 
while  (peekstring. length 0  ==  lenWord)  { 

Bitstring  iWord  =  sourceFile . readBitString (iBits) ; 

Bitstring  jWord  =  sourceFile . readBitString (jBits) ; 

//  write  iWord  (unencoded)  to  targetFile 
targetFile .writeBitString (iWord) ; 

//  write  Huffman  code  for  jword  to  targetFile 

targetFile. writeBitString ( (Bitstring) encodingMap [iWord . bitPattern ( ) ] .ge 
t  ( jWord) ) ; 


peekstring  =  sourceFile .peekBitString (lenWord) ; 

} 

Assertion. assert ^remainder. equals (peekstring) ) ; 
targetFile . close ( ) ; 


protected  void 

writeCompressionHeader (/*  OUT 

*/ 

BitStringWriter 

targetFile, 

/*  IN 

*/ 

BitStringReader 

sourceFile, 

/*  IN 

*/ 

int  iBits, 

/*  IN 

*/ 

int  jBits, 

/*  IN 

*/ 

int  iTree, 

/*  IN/OUT 

*/ 

HashMap  [] 

freqTables, 

/*  IN 

*/ 

Bitstring 

remainder. 

/*  IN 

*/ 

Bitstring 

offset) 

throws 

lOException 

{ 

//  first  entries  in  header  of  target  file 

targetFile. writeBitString (new  Bitstring ( (short) 5 ,  iBits)); 
targetFile. writeBitString (new  Bitstring ( (short) 5,  jBits) ) ; 
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target File .writeBitString (new  Bitstring ( (short) 5, 
offset . length ( ) ) ) ; 

targetFile .writeBitString (offset) ; 
target File .writeBitString (new  Bitstring ( (short) 5, 
remainder . length ( ) ) ) ; 

targetFile .writeBitString (remainder) ; 

//  convert  each  HashMap  from  a  f regency  table  to  an  encoding 
map  and  write 

//  header  info  for  each  Huffman  tree  (EXCLUDING  the  iTree) 
for  (int  i  =  0;  i  <  iTree;  i++)  { 

processTree (f reqTables [i] ,  targetFile) ; 

} 

/ /  write  numWordsRead  as  last  entry  in  header 

long  numWordsRead  =  (sourceFile.bitsRead ()  -  offset. length () )  / 
(iBits  +  jBits) ; 

Assertion. assert (numWordsRead  <=  Integer.MAX_VALUE) ; 
targetFile. writeBitString (new  Bitstring ( (short) 32, 

(int) numWordsRead) ) ; 

headerSize  =  targetFile .bitsWrote () ; 

} 

} 


package  thesis . compre ss ion. mult i_tree; 


import  java.util.*; 

import  j  ava . io . * ; 

import  org . omg . CORBA . IntHolder ; 

/*★ 

*  This  class  decompresses  files  compressed  by  the  RTA  approach 
presented  in  the  thesis. 

*/ 

public  class  MultitreeDecompressionManager  { 

private  BitStringWriter  targetFile; 
private  BitStringReader  sourceFile; 
private  HashMap []  decodingMaps; 
private  int  n; 

private  Bitstring  remainder; 
private  Bitstring  offset; 


public  static  void  main (String []  args)  { 

String  usage  =  new  String ( "USAGE :  java 
MulitreeDecompressionManager  ”  + 

”<sourceFilename>  <targetFilename>") ; 

try  { 

if  (args. length  1=  2)  { 

throw  new  ArrayIndexOutOfBoundsException () ; 
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} 

BitStringReader  sourceFile  =  new  BitStringReader (args [0]  )  ; 
BitStringWriter  targetFile  =  new  BitStringWriter (args [1]  ) ; 
new  MultitreeDecompressionManager (sourceFile,  targetFile); 
}  catch  (ArrayIndexOutOfBoundsException  wrongNumArgs)  { 

System. out .println (usage) ; 

}  catch  (Exception  e)  { 

e .printStackTrace (new  PrintStream (System. out ) ) ; 

}  finally  { 

System. exit (1) ; 

} 

} 

public  MultitreeDecompressionManager (BitStringReader  sourceFile, 

BitStringWriter  targetFile) 

throws 

lOException  ■ 

{ 

this . sourceFile  =  sourceFile; 
this .targetFile  =  targetFile; 

n  =  (sourceFile . readBitString (5) ) .bitPattern 0 ; 

int  lenOffset  =  (sourceFile . readBitString (5 )) .bitPattern () ; 

offset  =  sourceFile . readBitString (lenOffset ) ; 

int  lenRem  =  (sourceFile . readBitString (5) ) .bitPattern () ; 

remainder  =  sourceFile.readBitString(lenRem); 

decodingMaps  =  new  HashMap [n  +  1] ; 

for  (int  i  =  0;  i  <=  n;  i++)  { 

decodingMaps [i]  =  new  HashMap (); 

} 

decompressSourceFile (targetFile,  sourceFile,  n,  decodingMaps, 
remainder,  offset) ; 

} 


protected  void 

decompressSourceFile (/* 

OUT 

*/ 

BitStringWriter 

targetFile, 

/* 

IN 

*/ 

BitStringReader 

sourceFile, 

/* 

IN 

*/ 

int  n. 

/* 

OUT 

*/ 

HashMap  [] 

decodingMaps, 

/* 

IN 

*/ 

Bitstring  remainder, 

/* 

IN 

*/ 

Bitstring  offset) 

throws 

lOException 

{ 

targetFile .writeBitString (offset) ; 

//  build  decoding  maps  (Huffman  code  ==>  class  index) 

//  remember  length  of  shortest  Huffman  code  in  each  map 
IntHolder[]  lenOf ShortHufCode  =  new  IntHolder [n  +  1] ; 
for  (int  i  =  0;  i  <=  n;  i++)  { 

lenOf ShortHufCode  [i]  =  new  IntHolderO; 
buildDecodingMap (sourceFile,  n,  decodingMaps [i]  , 
lenOf ShortHufCode [i] ,  i  ==  n) ; 

} 
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//  build  decoding  table  (necklace  class  index  ==>  necklace 

class) 

long  numClasses  =  Necklace .numClasses (n) ; 

Assertion. assert (numClasses  <=  Integer .MAX_VALUE) ; 
int []  indexToClass  =  new  int  [  (int) numClasses]  ; 

Necklace . loadTable_indexToClass (n,  indexToClass)  ; 

long  numBitStringsCompressed  = 

(sourceFile.readBitString (32) ) .bitPattern () ; 

sourceFile. setDecodeLimit  (2L  *  numBitStringsCompressed); 
sourceFile. setN (n) ; 

Bitstring  r  =  null,  c  =  null; 
int  rot  =  -1; 

//  get  first  encoded  rotation  in  sourceFile 
r  =  sourceFile. decodeBitString (decodingMaps [n]  , 
lenOf ShortHuf Code [n] .value) ; 

//  read  an  encoded  rotation  and  class  index  from  sourceFile 
//  write  an  unencoded  bitstring  to  targetFile 
while  (r. length 0  >0)  { 

rot  =  r .bitPattern 0 ; 

//  c  is  class  index 

c  =  sourceFile. decodeBitString (decodingMaps [rot] , 
lenOf ShortHuf Code [rot] .value) ; 

//  c  is  class 

c.relnit (n,  indexToClass  [c .bitPattern ( )  ] )  ; 

//  c  is  original  unencoded  bitstring  ! ! 1 
c.rShift (rot,  true) ; 

//  write  c  to  the  target  file 
targetFile .writeBitString (c) ; 

//  get  next  encoded  rotation  in  sourceFile 
//  a  Bitstring  of  length  0  indicates  end  of  sourceFile 
r  =  sourceFile .decodeBitString (decodingMaps [n] , 
lenOf ShortHuf Code [n] .value) ; 

} 

sourceFile. close  0 ; 

targetFile. writeBitString (remainder) ; 
targetFile. close  0 ; 

} 


protected  void  buildDecodingMap (/*  IN  */  BitStringReader 
sourceFile, 

/*  IN  */  int  n, 

/*  OUT  */  HashMap  decodingMap, 
/*  OUT  */  IntHolder 

lenOf Short CodeInTable , 

/*  IN  */  boolean  isRotMap) 

throws 

lOException 
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//  build  leaf sAtLevelList  for  getHuf fmanCodes  method 
int  lenLevel  =  (sourceFile . readBitString (5) ) .bitPattern ( ) ; 
int  numLevels  =  (sourceFile . readBitString (5 )) .bitPattern {) ; 
Bitstring []  leaf sAtLevelList  =  new  Bitstring [numLevels] ; 
for  (int  i  =  0;  i  <  numLevels;  i++)  { 

leaf sAtLevelList [i]  =  sourceFile . readBitString (lenLevel) ; 

} 

//  getHuf fmanCodes 

Bitstring []  freqOrderedHuf fmanCodes  = 

VonNoymanNode .getHuf fmanCodes (leaf sAtLevelList ) ; 

//  fill  decodingMap  with  (Huffman  code  class  index)  mapping 

int  lenindex  =  isRotMap  ?  Necklace . rotLength (n)  : 

Necklace . indexLength (n) ; 

int  numLeafs  =  freqOrderedHuf fmanCodes . length ; 
for  (int  i  =  0;  i  <  numLeafs;  i++)  { 

decodingMap . put (freqOrderedHuf fmanCodes [i] , 
sourceFile . readBitString (lenindex) ) ; 

} 

//  find  length  of  shortest  Huffman  code 
for  (int  i  =  0;  i  <  numLevels;  i++)  { 

if  (leaf SAtLevelList [i] . bitPattern ( )  !=  0)  { 

lenOfShortCodelnTable .value  =  i; 
return; 

} 

} 

} 

} 


package  thesis . compression .mult i_tree2 ; 


import  java.util.*; 

import  j  ava . io . * ; 

import  org . omg . CORBA. IntHolder; 

import  thesis  .  compression  .multi__tree .  *  ; 

/★* 

*  An  instance  of  this  class  decompresses  files  compressed  by 
MultitreeCompressionManager2 
*/ 

public  class  MultitreeDecompressionManager2  { 

private  BitStringWriter  targetFile; 
private  BitStringReader  sourceFile; 
private  HashMap[]  decodingMaps ; 
private  int  iBits; 
private  int  jBits; 
private  int  iTree; 
private  Bitstring  remainder; 
private  Bitstring  offset; 
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public  static  void  main (String []  args)  { 

String  usage  =  new  String ( "USAGE :  java 
MulitreeDecompressionManager2  "  + 

"<sourceFilename>  <targetFilename>" ) ; 


try  { 

if  (args. length  !=  2)  { 

throw  new  ArrayIndexOutOfBoundsException ( ) ; 

} 

BitStringReader  sourceFile  =  new  BitStringReader (args [0] ) ; 
BitStringWriter  targetFile  =  new  BitStringWriter  (args  [1],)  ; 
new  MultitreeDecompressionManager2 (sourceFile,  targetFile) ; 
}  catch  (ArrayIndexOutOfBoundsException  wrongNumArgs )  { 

System. out .print In (usage) ; 

}  catch  (Exception  e)  { 

e .print St ackTrace (new  PrintStream (System. out) ) ; 

}  finally  { 

System. exit (1) ; 

} 

} 


public  Multitre.eDecompressionManager2  (BitStringReader  sourceFile, 

BitStringWriter  targetFile) 

throws 

lOException 

{ 

this . sourceFile  =  sourceFile; 
this. targetFile  =  targetFile; 

iBits  =  (sourceFile . readBitString (5) ) .bitPattern ( ) ; 
jBits  =  ( sourceFile. readBitString (5) ) .bitPattern {) ; 
int  lenOffset  =  (sourceFile. readBitString (5) ) .bitPattern ( ) ; 
offset  =  sourceFile. readBitString (lenOff set ) ; 
int  lenRem  =  (sourceFile . readBitString (5) ) .bitPattern () ; 
remainder  =  sourceFile . readBitString (lenRem) ; 
iTree  =  (int) (Math.pow(2,  iBits)); 
decodingMaps  =  new  HashMap [iTree  +  1] ; 
for  (int  i  =  0;  i  <=  iTree;  i++)  { 

decodingMaps  [i]  =  new  HashMap (); 

} 

decompressSourceFile (targetFile,  sourceFile,  iBits,  jBits, 
iTree,  decodingMaps,  remainder,  offset) ; 

} 


targetFile, 

sourceFile, 


decodingMaps , 


(/* 

OUT 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

/* 

OUT 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

BitStringWriter 

BitStringReader 

int  iBits, 
int  jBits, 
int  iTree, 

HashMap  [] 

Bitstring  remainder. 
Bitstring  offset) 
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throws 

lOException 

{ 

targetFile . writeBitString (offset) ; 

//  build  decoding  maps  (Huffman  code  ==>  jWord) 

//  remember  length  of  shortest  Huffman  code  in  each  map 
IntHolder[]  lenOf ShortHuf Code  =  new  IntHolder [iTree  +  1]  ; 
for  (int  i  =  0;  i  <=  iTree;  i++)  { 

lenOf  ShortHuf  Code  [i]  =  new  IntHolderO; 

buildDecodingMap (sourceFile,  iBits,  jBits,  decodingMaps [i] , 
lenOfShortHufCode [i] ,  i  ==  iTree); 

} 

long  numBitStringsCompressed  = 

(sourceFile.readBitString(32) ) .bitPattern () ; 

sourceFile . setDecodeLimit (2L  *  numBitStringsCompressed); 
Bitstring  iWord  =  null,  jWord  =  null; 


//  get  first  encoded  iWord  in  sourceFile 
iWord  =  sourceFile. decodeBitString (decodingMaps [iTree]  , 
lenOf ShortHuf Code [iTree] .value) ; 

//  read  an  encoded  iWord  and  jWord  from  sourceFile 
//  write  a  decoded  iWord  and  jWord  to  targetFile 
while  (iWord. length 0  >0)  { 

jWord  = 

sourceFile . decodeBitString (decodingMaps [iWord. bitPattern ( )  ]  , 
lenOfShortHufCode [iWord. bitPattern 0 ] .value) ; 

targetFile .writeBitString (iWord) ; 
targetFile .writ eBit String (jWord) ; 

iWord  =  sourceFile . decodeBitString (decodingMaps [iTree]  , 
lenOfShortHufCode [iTree] .value) ; 

} 


sourceFile . close ( ) ; 

targetFile . writeBitString (remainder) ; 
targetFile . close ( ) ; 


protected  void  buildDecodingMap (/*  IN  */  BitStringReader 
sourceFile, 

/*  IN  */  int  iBits, 

/*  IN  int  jBits, 

/*  OUT  */  HashMap  decodingMap, 
/*  OUT  */  IntHolder 

lenOf ShortCodeInTable , 

/*  IN  */  boolean  isIWordMap) 

throws 

lOException 

{ 

//  build  leaf sAtLevelList  for  getHuf fmanCodes  method 

int  lenLevel  =  (sourceFile . readBitString  (5) ) .bitPattern () ; 

int  numLevels  =  (sourceFile . readBitString (5) ) .bitPattern () ; 
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Bitstring []  leaf sAtLevelList  =  new  Bitstring [numLevels] ; 
for  (int  i  =  0;  i  <  numLevels;  i++)  { 

leaf SAtLevelList  [i]  =  sourceFile . readBitString (lenLevel) ; 

} 

//  getHuf fmanCodes 

Bitstring  []  freqOrderedHuf fmanCodes  = 

VonNoymanNode. getHuf fmanCodes (leaf sAtLevelList) ; 

//  fill  decodingMap  with  (Huffman  code  jWord)  mapping 

int  bits2Read  =  isIWordMap  ?  iBits  :  jBits; 
int  numLeafs  =  freqOrderedHuf fmanCodes . length ; 
for  (int  i  =  0;  i  <  numLeafs;  i++)  { 

decodingMap .put (freqOrderedHuf fmanCodes  [i] , 
sourceFile. readBitString (bits2Read) ) ; 

} 

//  find  length  of  shortest  Huffman  code 
for  (int  i  =  0;  i  <  numLevels;  i++)  { 

if  (leaf sAtLevelList  [i]  .bitPatternO  !=  0)  { 

lenOfShortCodelnTable .value  =  i; 
return; 

} 

} 

} 

} 


package  thesis . compress ion. multi_tree3 ; 


import  j  ava . ut i 1 . * ; 

import  java.io.*; 

import  org . omg . CORBA . IntHolder ; 

import  thesis .compression. mult i_tree. *; 

import  thesis .compress ion. multi_tree2 .*; 

/** 

*  An  instance  of  this  class  decompresses  a  file  compressed  by 
MultitreeCompressionManagerS . 

*/ 

public  class  MultitreeDecompressionManager3 
extends  MultitreeDecompressionManager2 
{ 

public  static  void  main (String []  args)  { 

String  usage  =  new  String ( "USAGE :  java 
MulitreeDecompressionManager3  "  + 

”<sourceFilename>  <targetFilename>” ) ; 

try  { 

if  (args. length  !=  2)  { 

throw  new  ArrayIndexOutOfBoundsException  ( ) ; 

} 

BitStringReader  sourceFile  =  new  BitStringReader (args [0] ) 
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BitStringWriter  targetFile  =  new  BitStringWriter (args [1] ) ; 
new  MultitreeDecompressionManagerS (sourceFile,  targetFile) ; 
}  catch  (ArrayIndexOutOfBoundsException  wrongNumArgs )  { 

System. out . println (usage) ; 

}  catch  (Exception  e)  { 

e .printStackTrace (new  PrintStream (System. out) ) ; 

}  finally  { 

System. exit (1) ; 

} 

} 


public  MultitreeDecompressionManagerS (BitStringReader  sourceFile, 

BitStringWriter  targetFile) 

throws 

lOException 

{ 

super (sourceFile,  targetFile); 

} 


protected  void  buildDecodingMap (/*  IN  */ 
sourceFile, 

/*  IN  */ 
/*  IN  */ 
/*  OUT  */ 
/*  OUT  */ 

lenOfShortCodelr.Table , 

/*  IN  */ 

throws 

lOExceptior. 

{ 

int  nurLN'or.Enpt:  yhevels  = 

(sourceFile .reaiBitString(5) ) .bit Pat tern ( ) ; 
if  {nu-'-^cr.Er^ptyLevels  ==0)  { 

r  e  t  u  r  r. , 

} 


BitStringReader 

int  iBits, 
int  jBits, 

HashMap  decodingMap, 
IntHolder 

boolean  isIWordMap) 


//  build  leaf sAtLevelList  for  getHuf fmanCodes  method 
int  f  1  r  E  t  Nr^nErptyLevel  = 

(sourceFile . readB ; t  St  r ing (5 ) )  .bit Pattern ( ) ; 

int  lenSevel  •  ( sourceFile . readBitString (5) ). bitPattern () ; 

int  nurLevels  «  numNonEmptyLevels  +  f irstNonEmptyLevel ; 
Bitstring []  leaf sAtLevelList  =  new  Bitstring [numLevels] ; 
for  (int  1  *  0;  i  <  numLevels;  i++)  { 

if  (i  <  f irstNonEmptyLevel)  { 

leaf SAtLevelList [i]  =  new  Bitstring ( (short) lenLevel , 

0)  ; 

}  else  { 

leaf sAtLevelList [i]  = 
sourceFile . readBitString (lenLevel) ; 

} 

} 


//  getHuf fmanCodes 

Bitstring []  freqOrderedHuf fmanCodes  = 
VonNoymanNode .getHuf fmanCodes (leaf sAtLevelList ) ; 


//  fill  decodingMap  with  (Huffman  code  jWord)  mapping 
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int  bits2Read  =  isIWordMap  ?  iBits  :  jBits; 
int  numLeafs  =  f reqOrderedHuff manCodes . length ; 
for  (int  i  =  0;  i  <  numLeafs;  i++)  { 

decodingMap .put (f reqOrderedHuff manCodes [i] , 
sourceFile . readBitString (bits2Read) ) ; 

} 

//  find  length  of  shortest  Huffman  code 
for  (int  i  =  0/  i  <  numLevels;  i++)  { 

if  (leaf sAtLevelList [i] .bitPattern 0  !=  0)  { 

lenOfShortCodelnTable. value  =  i; 
return; 

} 

} 

} 


} 


package  thesis . compress ion. multi_tree4 ; 


import  java.util.*; 

import  j  ava . io . * ; 

import  org . omg . CORBA . IntHolder ; 

import  thesis . compression .mult i_tree . * ; 

import  thesis . compress ion. mult i_tree3 . *; 

j-k-k 

*  An  instance  of  this  class  decompresses  a  file  compressed  by 
MultitreeCompressionManager4 . 

*/ 

public  class  MultitreeDecompressionManager4 
extends  MultitreeDecompressionManagerS 
{ 

public  static  void  main (String []  args)  { 

String  usage  =  new  String ( "USAGE :  java 
MulitreeDecompressionManager4  "  + 

"<sourceFilename>  <targetFilename>" ) ; 


try  { 

if  (args. length  !=  2)  { 

throw  new  ArrayIndexOutOfBoundsException () ; 

} 

BitStringReader  sourceFile  =  new  BitStringReader (args  [0] ) ; 
BitStringWriter  targetFile  =  new  BitStringWriter (args  [1] ) ; 
new  MultitreeDecompressionManager4 (sourceFile,  targetFile) 
}  catch  (ArrayIndexOutOfBoundsException  wrongNumArgs )  { 

System . out . print In (usage ) ; 

}  catch  (Exception  e)  { 

e. print StackTrace (new  Print Stream (System. out) ) ; 

}  finally  { 

System. exit (1) ; 
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} 


} 

public  MultitreeDecompressionManager4 (BitStringReader  sourceFile, 

BitStringWriter  targetFile) 

throws 

lOException 

{ 

super (sourceFile,  targetFile) ; 

} 


protected  void 

decompress SourceFile (/* 

OUT 

*/ 

BitStringWriter 

targetFile, 

/* 

IN 

*/ 

BitStringReader 

sourceFile, 

/* 

IN 

*/ 

int  iBits, 

/* 

IN 

*/ 

int  jBits, 

/* 

IN 

*/ 

int  iTree, 

/* 

OUT 

*/ 

HashMap  [] 

decodingMaps , 

/* 

IN 

*/ 

Bitstring  remainder. 

/* 

IN 

*/ 

Bitstring  offset) 

throws 

lOException 

{ 

target Fi le . wr i teBitString (offset) ; 

//  build  decoding  maps  (Huffman  code  ==>  jWord) 

//  re.T.er^er  length  of  shortest  Huffman  code  in  each  map 
IntHolder;]  lenOf ShortHuf Code  =  new  IntHolder [iTree  +  1]  ; 
for  (int  1  »  0;  i  <  iTree;  i++)  { 

lenC  f  Short Huf  Code  [i]  =  new  IntHolderO; 

bui IdSecodingMap (sourceFile,  iBits,  jBits,  decodingMaps [i] , 
lenOf ShortHuf Code [i] ,  i  ==  iTree) ; 

} 

long  n jrwc rdsToRead  = 

(sourceFile . readFitString (32) ) . bitPattern ( ) ; 

sourceFi le . setDecodeLimit (numWordsToRead) ; 

Bitstring  iWord  =  null,  jWord  =  null; 


//  get  first  unencoded  iWord  in  sourceFile 
iWord  =  sourceFile . readBitString (iBits) ; 

//  read  an  unencoded  iWord  and  an  encoded  jWord  from  sourceFile 
//  write  iWord  and  jWord  to  targetFile 
while  (numWordsToRead  >0)  { 

jWord  = 

sourceFile . decodeBitString (decodingMaps [iWord. bitPattern ( )  ]  , 
lenOf ShortHuf Code [iWord. bi tPattern 0 ] .value) ; 

targetFile .writeBitString (iWord) ; 
targetFile . writeBitString (jWord) ; 
numWordsToRead- - ; 

iWord  =  sourceFile . readBitString (iBits) ; 


} 
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sourceFile. close  0 ; 

targetFile . writeBitString (remainder) ; 
targetFile. close  0 ; 

} 

} 


package  thesis . compression .mult i_tree5 ; 


import  java.util.*; 

import  j  ava . io . * ; 

import  org.omg.CORBA.IntHolder; 

import  thesis . compression .multi_tree . * ; 

y  *  * 

*  An  instance  of  this  class  decompresses  a  file  compressed  by 
MultitreeCompressionManagerS . 

*/ 

public  class  MultitreeDecompressionManagerS 
extends 

Mul t i t r e  eDe  comp r e  s  s ionManage  r 

{ 

private  BitStringWriter  targetFile; 
private  BitStringReader  sourceFile; 
private  HashMap[]  decodingMaps; 
private  int  n; 

private  Bitstring  remainder; 
private  Bitstring  offset; 


public  static  void  main (String []  args)  { 

String  usage  =  new  String ( "USAGE :  java 
MulitreeDecompressionManagerS  "  + 

"<sourceFilename>  <targetFilename>" ) ; 


try  { 

if  (args. length  !=  2)  { 

throw  new  Array IndexOutOfBoundsExcept ion () ; 

} 

BitStringReader  sourceFile  =  new  BitStringReader (args [0] ) ; 
BitStringWriter  targetFile  =  new  BitStringWriter (args [1] ) ; 
new  MultitreeDecompressionManagerS (sourceFile,  targetFile) 
}  catch  (Array IndexOutOfBoundsExcept ion  wrongNumArgs )  { 

System. out .println (usage) ; 

}  catch  (Exception  e)  { 

e .print StackTrace (new  PrintStream (System. out) ) ; 

}  finally  { 

System. exit (1) ; 

} 

} 
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public  MultitreeDecompressionManagerS (BitStringReader  sourceFile, 

BitStringWriter  targetFile) 

throws 

lOException 

{ 

super (sourceFile ,  targetFile); 

} 


protected  void 

decompressSourceFile {/* 

OUT 

*/ 

BitStringWriter 

targetFile, 

/* 

IN 

*/ 

BitStringReader 

sourceFile, 

/* 

IN 

*/ 

int  n, 

/* 

OUT 

*/ 

HashMap  [] 

decodingMaps, 

/* 

IN 

*/ 

Bitstring  remainder. 

/* 

IN 

*/ 

Bitstring  offset) 

throws 

lOException 

{ 

targetFile . writeBitString (offset) ; 

//  build  decoding  maps  (Huffman  code  ==>  class  index) 

//  remember  length  of  shortest  Huffman  code  in  each  map 
IntHolder[]  lenOf ShortHuf Code  =  new  IntHolder [n  +  1] ; 
for  (int  i  =  0;  i  <=  n;  i++)  { 

lenOf ShortHufCode  [i]  =  new  IntHolderO; 
buildDecodingMap (sourceFile,  n,  decodingMaps [i] , 
lenOf ShortHuf Code [i] ,  i  ==  n) ; 

} 

//  build  decoding  table  (necklace  class  index  ==>  necklace 

class) 

long  numClasses  =  Necklace.numClasses(n); 

Assertion .  assert -{numClasses  <=  Integer .MAX_VALUE)  ; 
int []  indexToClass  =  new  int [ (int ) numClasses] ; 

Necklace . loadTable_indexToClass (n,  indexToClass) ; 

long  numBitStringsCompressed  = 

(sourceFile .readBitString (32) ) .bitPattern ( ) ; 

sourceFile . setDecodeLimit (2L  *  numBitStringsCompressed); 
sourceFile . setN (n) ; 

Bitstring  r  =  null,  c  =  null; 
int  rot  =  -1 ; 

//  get  first  encoded  rotation  in  sourceFile 
r  =  sourceFile. decodeBitString (decodingMaps [n] , 
lenOf ShortHufCode [n] .value) ; 

//  read  an  encoded  rotation  and  class  index  from  sourceFile 
//  write  an  unencoded  bitstring  to  targetFile 
while  (r. length  0  >0)  { 

rot  =  r .bitPattern 0 ; 

//  c  is  class  index 
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c  =  sourceFile . decodeBitString (decodingMaps [0] , 
lenOf ShortHuf Code [0] .value) ; 

//  c  is  class 

c.relnit ( (short) n,  indexToClass [c .bitPattern () ] ) ; 

//  c  is  original  unencoded  bitstring  !1! 
c . rShif t (rot ,  true) ; 

//  write  c  to  the  target  file 
targetFile . writeBitString (c) ; 

//  get  next  encoded  rotation  in  sourceFile 
//  a  Bitstring  of  length  0  indicates  end  of  sourceFile 
r  =  sourceFile. decodeBitString (decodingMaps [n] , 
lenOf ShortHuf Code [n] .value) ; 

} 

sourceFile. close  0 ; 

targetFile .writeBitString (remainder) ; 
targetFile . close ( ) ; 

} 

} 


package  thesis . compression. multi_tree; 


import  java.util.*; 
import  j  ava . lang . * ; 
import  java.io.*; 


/** 

*  This  class  contains  all  the  supporting  methods  needed  to  implement 
the  RTA  approach 

*  presented  in  the  thesis .  See  chapter  4  for  an  explanation  of  the  r 
and  i  values 

*  referred  to  thruout  the  code. 

*/ 

public  class  Necklace  { 

private  final  int  []  bitStringToIndex; 
private  final  byte []  bitStringToRots; 
private  final  int []  indexToClass; 
private  final  int  n; 
private  final  long  numClasses; 
private  final  int  indexLen; 
private  final  int  rotLen; 

/**  Returns  the  length  in  bits  of  the  i  associated  with  a  symbol  of 
length  n  >=  1.  */ 

public  static  final  int  indexLength ( int  n)  ( 
return  n  -  rotLength(n)  +  1; 

} 
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/**  Returns  the  length  in  bits  of  the  r  associated  with  a  symbol  of 
length  n  >=  1 .  */ 

public  static  final  int  rotLength (int  n)  { 

return  (int) Math. ceil (Math. log (n)  /  Math. log (2) ) ; 

} 

/**  Helper  function  for  numClasses (n)  */ 
public  static  int  gcdEuclid (int  a,  int  b)  { 
if  (b==0)  { 
return  a; 

}  else  { 

return  gcdEuclid (b,  a  %  b) ; 

} 

} 

/**  Helper  function  for  numClasses (n)  */ 

public  static  final  boolean  areRelativelyPrime (int  a,  int  b)  { 
return  gcdEuclid (a,  b)  ==  1  ?  true  :  false; 

} 


/**  Helper  function  for  numClasses (n)  */ 

public  static  final  int  theta (int  d)  { 
int  count  =  0 ; 

for  (int  i  =  1;  i  <=  d;  i++)  { 

if  (areRelativelyPrime (d,  i) )  { 

count++; 

} 

} 

return  count; 

} 

/** 

*  Uses  the  Burnside  formula  to  calculate  the  exact  number  of 
classes  for  a  given 

*  value  of  n. 

*/ 

public  static  final  long  numClasses (int  n)  { 
long  sum  =  0; 

for  (int  d  =  1;  d  <=  n;  d++)  { 

if  (n  %  d  ==  0)  { 

sum  +=  theta(d)  *  Math.pow(2,  n  /  d) ; 

} 

} 

return  sum  /  n; 

} 

y  *  * 

*  The  necklace  algorithm.  This  method  builds  a  table  which 
allows  the  efficient 

*  lookup  of  a  necklace  class  based  upon  its  index  in  a  list  of 
all  such  classes. 

* 

*  PRECONDITION:  indexToClass . length  ==  numClasses (n) 

*  POSTCONDITION;  indexToClass  contains  all  the  necklace  classes 
of  length  n 
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*  as  given  by  the  necklace  algorithm  described  in 
chapter  3  of  the 

*  thesis. 

*/ 

public  static  void  loadTable_indexToClass {  /*  IN  */  int  n, 

/*  OUT  */  int[] 


indexToClass ) 

{ 

int  k  =  0; 

// 

index  to 

store  next  class  at 

Bitstring  c  =  new  Bitstring ( (short) n,  -1) ; 

// 

the  first 

necklace  class 

indexToClass  [k++]  =  c.bitPatternO  ; 

// 

store  it  at 

index  0  of  classes 


while  {c.bitPatternO  !=  0)  { 
int  j  =  c .leastSigl 0 ; 
c.clearBit (j ) ; 

sig  1  with  0 

int  jj  =  j  -  1; 
int  i  =  c. length  0  -  1; 
significant  bit 

while  (jj  >=  0)  { 

copy  over,  . . . 

c.assignBit (j j ,  c.bitAt (i) ) ; 

j  j““; 


i’-; 

} 

if  ( (n  %  (n  -  j ) )  ==  0)  { 

indexToClass  [k++]  =  c.bitPatternO; 

} 


//  replace  least 

//  most 
//  copy  over, 

//  j  check 


/** 

*  This  method  builds  two  tables  of  size  2'^n.  The 
bitStringToIndex  table  allows 

*  the  efficient  lookup  of  the  class  mapped  to  by  any  bit  string 
of  length  n.  The 

*  bitStringToRots  table  allows  the  efficient  lookup  of  the  number 
of  rotations 

*  required  to  map  any  bit  string  of  length  n  to  its  necklace 
class . 

*/ 

public  static  void 

loadTables_bitStringToIndex_bitStringToRots (/*  IN  */  int  n, 

/*  IN  */  int[] 

indexToClass , 

/*  IN  */  long 

numClasses, 

/*  OUT  */  int[] 

bitStringToIndex, 

/*  OUT  */  byte[] 

bitStringToRots) 

{ 

for  (int  i  s=  0;  i  <  numClasses;  i++)  { 

Bitstring  bs  =  new  Bitstring (n,  indexToClass [i] ) ; 
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int  j  =  0; 
do  { 

bitStringToIndex [bs . bi tPattern ( ) ]  =  i ; 
bitStringToRots [bs .bitPattern () ]  =  (byte) j ; 

j++; 

bs . rShif t (true) ; 

}  while  (bs .bitPattern 0  !=  indexToClass  [i] ) / 

} 

} 

/** 

*  Constructs  a  Necklace  object  of  length  n.  This  object  provides 
access  to  all 

*  the  tables  and  methods  needed  to  efficiently  work  with  binary 
necklaces  of  length  n. 

*/ 

public  Necklace (int  n)  { 
this.n  =  n; 

numClasses  =  numClasses (n) ; 

Assertion. assert (numClasses  <=  Integer .MAX^VALUE) ; 
indexToClass  =  new  int [ (int) numClasses] ; 
loadTable_indexToClass (n,  indexToClass) ; 

bitStringToIndex  =  new  intt(int) (Math.pow(2,  n) ) ] ; 
bitStringToRots  =  new  byte[bitStringToIndex.length]; 
loadTables_bitStringToIndex_bitStringToRots ( 

n,  indexToClass,  numClasses,  bitStringToIndex, 
bitStringToRots) ; 

indexLen  =  indexLength  (n)  ,* 
rotLen  -  rotLength (n) ; 

} 

/**  Returns  the  index  of  the  necklace  class  mapped  to  by  bitstring 

*/ 

public  final  int  bitStringToIndex (int  bitstring)  { 
return  bitStringToIndex [bitstring] ; 

} 

/**  Returns  the  #  of  rotations  required  to  map  bitstring  to  its 
necklace  class  */ 

public  final  byte  bitStringToRots (int  bitstring)  { 
return  bitStringToRots [bitstring] ; 

} 

/**  Returns  the  necklace  class  at  the  given  index.  */ 
public  final  int  indexToClass (int  index)  { 
return  indexToClass [index] ; 

} 

/**  Returns  the  value  of  n  being  used  by  this  Necklace  object  */ 
public  final  int  n()  { 
return  n; 

} 
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/**  Returns  the  number  of  classes  associated  with  this  Necklace 
object  */ 

public  final  long  numClassesO  { 
return  numClasses; 

} 

/**  Returns  the  length  in  bits  of  the  r's  associated  with  this 
Necklace  object  */ 

public  final  int  rotLengthO  { 
return  rotLen; 

} 

/**  Returns  the  length  in  bits  of  the  i's  associated  with  this 
Necklace  object  */ 

public  final  int  indexLength ( )  { 

return  indexLen; 

} 

} 


package  thesis . compression. multi_tree; 


import  java.util.*; 
import  j  ava . io . * ; 

import  thesis . compression . huff man . * ; 


/** 

*  Objects  of  this  class  are  used  to  collect  compression  statistics 
for  any  method  that 

*  implements  the  StatsCollected  Interface.  Each  Stats  object  holds 
statistics  for 

*  one  implementor  of  StatsCollected, 

*/ 

public  class  Stats  implements  Comparable  { 


private  String  filename; 
extension 

private  String  ext; 
private  int  n; 
compression 

private  int  iBits; 
of  Huffman  tree 

private  int  jBits; 
node  of  Huffman  tree 
private  int  iTree; 
tree) 

private  int  lenOffset; 
bits  in  sourceFile 


//  complete  filename  including 

//  filename  extension 
//  word  length  used  for 

//  num  bits  to  specify  index 

//  num  bits  to  specify  leaf 

//  highest  tree  index  (iWord 

//  #  of  uncompressed  leading 


private  long  sizeBefore; 
private  long  sizeAfter; 


//  in  bits 
//  in  bits 
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private  double  []  stringDistr; 
each  Huffman  tree 

private  int  []  classDistr; 
Huffman  tree 

private  long  sizeAf terHuf ; 
straight  Huffman 

private  long  headerSize; 


//  %  of  source  file  mapped  to 
//  #  classes  mapped  to  each 

//  file  size  achieved  using 
//  header  size 


public  Stats (StatsCollected  mcm) 

throws 

lOException, 

StatsNotAvailableException 

{ 

n  =  mcm . n ( ) ; 

iBits  =  mcm.iBitsO; 

jBits  =  mcm.jBitsO; 

iTree  =  mcm.iTreeO; 

lenOffset  =  mcm. lenOf f set ()  ; 

filename  =  mcm. sourceFilename ( )  ; 

int  dotPos  =  filename . indexOf 

ext  =  (dotPos  1=  -1  ?  filename . substring (dotPos)  : 

mcm.buildFregTables () ; 

sizeBefore  =  mcm. sourceFileSize ( )  ; 

long  numBitStrings  =  sizeBefore  /  n;  //  int  div 

stringDistr  =  new  double [iTree] ; 

Arrays . fill (stringDistr,  0)  ; 

Iterator  it  =  mcm. lastTreelnspector ( ) ; 
while  (it .hasNext {) )  { 

VonNoymanNode  node  =  (VonNoymanNode) it . next ( )  ; 
stringDistr [node . index. bi tPattern ( )  ]  = 

(double) node . freq  /  (double) numBitStrings  *  lOOd; 

} 


classDistr  =  mcm.treeSizesO; 

mcm.compressSourceFile 0 ; 

sizeAfter  =  mcm. targetFileSize ( ) ; 
headerSize  =  mcm. headerSize ()  ; 
mcm  =  null; 

} 

public  Stats (StatsCollected  mcm, 

Huf fmanCompressionManager  hem) 

throws 

lOException, 

StatsNotAvailableException 

{ 

this  (mcm)  ; 
hem . compress ( ) ; 
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sizeAfterHuf  =  hem. sizeAf ter () ; 
hem  =  null; 

} 

public  final  int  compareTo (Stats  s)  { 
if  (ext .equals (s. ext) )  { 

if  ( filename. equals (s. filename) )  { 
if  (n  ==  s .n)  { 

if  (iBits  ==  s.iBits)  { 

return  lenOffset  -  s.lenOffset; 
}  else  { 

return  iBits  •-  s.  iBits ; 

} 

}  else  { 

return  n  s.n; 

} 

}  else  { 

return  filename .compareTo (s . filename) ; 

} 

}  else  { 

return  ext . compareTo (s .ext) ; 

} 

} 

public  final  inr  compareTo (Object  ob j )  { 
return  compareTo ( (Stats) obj ) ; 

} 

public  String  toStringO  { 

StringB jf f er  sb  =  new  StringBufferO ; 

String  nl  new  String("\r\n"); 

sb.appen:!  ' -FILENAME  :  ")  ; 

sb . append ; f : lename) ; 
sb . append  in  1 5  ; 

sb.append(-n  :  ”)  ; 

sb .  append  n ;  ; 
sb . append ( nl ; ; 

sb. append ( -iBits  :  ”) ; 

sb .append  ( iBits ) ; 
sb. append (nl ) ; 

sb.append  (*' jBits  :  ”)  ; 

sb. append  ( jBits) ; 
sb.append(nl); 

sb.append ("offset  :  "); 

sb.append (lenOffset) ; 
sb.append (nl) ; 
sb.append (nl)  ; 

sb.append ("file  size:  "); 
sb. append (toKB (sizeBef ore) ) ; 
sb.append (nl) ; 
sb.append (nl) ; 
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sb . append ( "header  size:  "); 
sb. append (headerSize  /  8) ; 
sb . append ( "  bytes " ) ; 
sb . append (nl) / 
sb. append (nl) ; 

if  (sizeAf terHuf  >  0)  { 

sb. append ("HUF  size  :  "); 
sb. append (toKB (sizeAf terHuf ) ) ; 
sb. append (nl) ; 

sb . append ( " change  :  "); 

sb . append (change (sizeBef ore ,  sizeAf terHuf ) ) ; 
sb. append (nl) ; 
sb. append (nl) ; 

} 

sb.appendC'MTC  size  :  ")  ; 
sb. append (toKB (sizeAf ter) ) ; 
sb. append (nl) ; 

sb . append ( " change  :  "); 

sb.append (change (sizeBef ore,  sizeAf ter) ) ; 
sb . append (nl) ; 
sb. append (nl) ; 

sb. append ("tree  strings  leafs") ; 
sb. append (nl) ; 
double  tmp; 

for  (int  i  =  0;  -i  <  iTree/  i++)  { 

sb . append ( i ) ; 

sb.append((i  <  10  ?  "  "  :  " 

tmp  =  round (stringDistr [i] ,  0.1); 
sb. append (tmp) ; 
sb . append ( " % " )  ; 

sb . append (( tmp  <  10  ?  "  "  : 

sb. append (classDistr [i]  )  ; 
sb. append (nl) ; 

} 


")); 

" ) )  ; 


return  sb . toString ( ) / 

} 

private  final  String  change (long  orig,  long  cur)  { 
long  diff  =  cur  -  orig; 

return  String .valueOf (round ( (double) diff  /  (double)orig  *  lOOd, 
O.ld))  +  "%"; 

} 

private  final  String  toKB (long  bits)  { 

return  String. valueOf (round ( (double) bits  /  8d  /  1024d,  O.OOld)) 

+  "  KB"; 

} 


private  final  double  round (double  number, 

double  placeValue  )  { 
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double  recipPlaceValue  =  1. 0/placeValue; 

return  (  (int) (nuraber*recipPlaceValue+0 . 5d)  /  recipPlaceValue 

); 

} 

} 

package  thesis .compression. mult i_tree; 

import  java.util.*; 
import  java.io.*; 

/** 

*  This  interface  declares  methods  needed  to  collect  statistics  on  the 
various 

*  compression  classes  like  Huf fmanCompressionManager  and 
MultitreeCompressionManager) . 

*  Classes  which  implement  this  interface  can  be  analyzed  by  the 
StatsManager . 

*/ 

public  interface  StatsCollected 

{ 

public  int  iBitsO; 

public  int  jBitsO; 

public  int  iTreeO; 

public  int  n(); 

public  int  lenOffsetO; 

public  String  sourceFilename ( ) ; 

public  long  sourceFileSize ()  throws  StatsNotAvailableException; 
public  long  targetFileSize ( )  throws  StatsNotAvailableException; 
public  Iterator  lastTreelnspector ( )  throws 
StatsNotAvailableException; 

public  long  headerSize ()  throws  StatsNotAvailableException; 
public  int  []  treeSizesO  throws  StatsNotAvailableException; 

public  void  buildFreqTables ( )  throws  lOException; 
public  void  compressSourceFile ()  throws  lOException; 

} 


package  thesis . comp re ssion. mult i_tree ; 

import  j ava . util . * ; 
import  java.io.*; 
import  j  ava . text . * ; 

import  thesis . compression. huf f man. *; 
import  thesis . compression. huf fman2 . * ; 
import  thesis . compress ion. multi_tree2 . * 
import  thesis. compression. multi_tree3 .* 
import  thesis . compression. multi_tree4 . * 
import  thesis . compression . multi_tree5 . * 


129 


/  *  * 

*  This  static  class  drives  a  simple  text  based  menu  which  allows 
testing  of  all 

*  implemented  compression  algorithms.  The  main  method  of  this  class 
is  the  standard 

*  entry  point  into  all  the  compression  algorithms  created  for  this 
thesis . 

*/ 

public  class  StatsManager  { 

public  static  final  String  szOutDir  =  "c:\\My 
Documents\\compressedFiles\\ " ; 

public  static  final  String  szInDir  =  "c:\\My 
DocumentsWf ilesToCompressW  "  ; 

private  static  TreeSet  registeredStats  =  new  TreeSetO; 


public  static  void  main (String []  args)  throws  Exception  { 


System. out .print In ( "\nMultitree  Compression  Tool,  May  6, 
200l\n") ; 

System. out .print In ( "USAGE") ; 

System . out . print In { "1)  Create  the  directory  +  szInDir  + 

System. out .println(”2)  Put  all  files  to  compress  in  specified 
directory. " ) ; 

System . out . print  In  (  "3 )  Follow  on  screen  prompts."); 

System . out . print  In ( "4 )  Compressed  files  &  date  stamped  stats 
file  at  "  +  szOutr^r  ; 

Buf f eredPeade r  inUser  =  new  Buff eredReader (new 
InputStreamReader ' System . in) ) ; 

System . out . pr int In ( "\nselect  an  MTC  method"); 

System . out . pr intln ( ”1)  rotational  trees" ) ; 

System. out .println("2)  rotational  trees  (tree  0  &  N  only)"); 
System . out . pr int In  ( "3 )  indexed  trees"); 

System, .  out  .  pr  int  In  ( "4 )  indexed  trees  +  tight  header  scheme"); 
System. out .println("5)  indexed  trees  +  tight  header  scheme  - 
Huffman  tree  for  iWords*’); 

System . out .print ( "Selection :  " ) ; 

int  method  »  Integer.parseInt(inUser.readLine()); 

System. out . pr intln () ; 

File  outDir  =  new  File (szOutDir) ; 

if  ( loutDir .exists  0  &&  I outDir .mkdir () )  { 

throw  new  lOException ( "Can\t  create  "  +  szOutDir  + 

" ! \nCreate  the  directory  "  + 

"manually  or  change  file  permissions  then  restart  this 

tool.\n") ; 

} 

File  inDir  =  new  File (szInDir) ; 
if  (! inDir .exists  0 )  { 

throw  new  FileNotFoundException ( 

"Input  directory  "  +  szInDir  +  "  does  not  exist!\n"); 

} 

string []  sourceFiles  =  inDir. list () ; 
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if  (sourceFiles  ==  null  ||  sourceFiles . length  ==  0)  { 

throw  new  FileNotFoundException ( 

"No  source  files  found  in  ' "  +  szInDir  +  "'!!!"); 

} 

String []  targetFiles  =  new  String [sourceFiles . length]  ; 
for  (int  i  =  0;  i  <  targetFiles . length;  i+*f)  { 

int  dotPos  =  sourceFiles [i] . indexOf 
if  (dotPos  >0)  { 

targetFiles [i]  =  sourceFiles  [i]  . substring (0,  dotPos) 
}  else  { 

targetFiles [i]  =  sourceFiles  [i] / 

} 

} 


switch  (method)  { 

case  1: 

case  2:  rotationalTrees (inUser,  outDir,  inDir, 

sourceFiles,  targetFiles,  method) ; 

break; 

case  3 : 

case  4 : 

case  5:  indexedTrees (inUser,  outDir,  inDir, 

sourceFiles,  targetFiles,  method) ; 

break; 

default:  throw  new  Exception ( "\nlnvalid  selection  - 

restart  the  tool  and  try  again."); 

} 

} 

private  static  void  rotationalTrees (Buf f eredReader  inUser, 

File  outDir, 

File  inDir, 

String []  sourceFiles, 
String []  targetFiles, 
int  mtcMethod) 

throws 

Exception 

{ 


System. out .print ( "Enter  a  lower  bound  for  N:  "); 
int  lowN  =  Integer.parseInt(inUser.readLine{)); 

System. out. print ("Enter  an  upper  bound  for  N:  ") ; 
int  highN  =  Integer .parseint (inUser . readLine ()) ; 

boolean  offsetsOn  =  false; 

System.out .print  ("Turn  offsets  ON  (y/n)  ?  "),*' 
if  (inUser. readLine 0 .equalsIgnoreCase ("y") )  { 

offsetsOn  =  true; 

} 

boolean  hufCompOn  =  false; 

System. out. print ("Turn  Huffman  comparisons  ON  (y/n)?  ") ; 
if  (inUser . readLine 0 . equalsIgnoreCase ( "y" ) )  { 

131 


hufCompOn  =  true; 


} 

System. out .print {"\nTable  building  memory  requirements  for  this 

run:  ") ; 

System. out .print ( (long) (5  *  Math.pow(2,  highN)  +  4  * 

Necklace .numClasses (highN) ) ) ; 

System . out . print In ( ”  bytes " ) ; 

System . out . print In ( " \nPROGRESS : ” ) ; 

for  (int  i  =  0;  i  <  sourceFiles . length  *  (highN  -  lowM  +  1)  ; 

i++ )  { 

System. out .print  (".”); 

} 

System . out . println ( ) ; 

for  (int  i  =  0/  i  <  sourceFiles . length;  i++)  { 
for  (int  n  =  lowN;  n  <=  highN;  n++)  { 

for  (int  offsetLen  =  0;  offsetLen  ==  0  | |  (offsetsOn  && 
offsetLen  <  n) ;  offsetLen++)  { 

BitStringReader  bsr  =  new  BitStringReader (szInDir  + 

sourceFiles  [i]  ,  n) ; 

BitStringWriter  bsw  =  new  BitStringWriter ( 

szOutDir  +  targetFiles [i]  +  n  +  +  offsetLen 

+  ”.MTC"  + 

(mtcMethod  ==  1  ?  "1"  :  ”5")); 

StatsCollected  mcm; 
i f  (mtcMethod  ==  1 )  { 

mcm  =  new  MultitreeCompressionManager (bsr,  bsw, 

n,  offsetLen) ; 

}  else  { 

mcm  =  new  MultitreeCompressionManagerS (bsr, 

bsw,  n,  offsetLen) ; 

} 

if  (hufCompOn)  { 

BitStringReader  bsr2  =  new 
BitStringReader (szInDir  +  sourceFiles  [i] ,  n) ; 

BitStringWriter  bsw2  =  new  BitStringWriter ( 
szOutDir  +  targetFiles [i]  +  n  +  + 

OffsetLen  +  ".HUF"); 

Huf fmanCompressionManager  hem  = 

new  Huf fmanCompressionManager (bsr2 ,  bsw2 , 

n,  offsetLen) ; 

registerStats (new  Stats (mcm,  hem) ) ; 

}  else  { 

registerStats (new  Stats (mcm) ) ; 

} 

} 

System . out . print (".”); 

} 

} 

System. out .println ( "\n”) ; 

outputStats (mtcMethod  +  "  (rotational  trees)”); 


} 
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private  static  void  indexedTrees (Buf feredReader  inUser, 

File  outDir, 

File  inDir, 

String  []  sourceFiles, 

String  []  targetFiles , 
int  mtcMethod) 

throws 

Exception 

{ 

String  userPair  =  null; 

int  indexOf Space  =  -1,  userl  =  -1,  userJ  =  -1,  lowN  = 

B i t S t r i ng . MAXLEN ; 

ArrayList  arrayl  =  new  ArrayListO,  arrayJ  =  new  ArrayListO; 

System.out .print ("Enter  a  space  delimited  iBits  jBits  pair 
(like  4  12  or  8  8) :  ") ; 

userPair  =  inUser . readLine ( ) ; 

while  (userPair  !=  null  &&  userPair . length ( )  >=  3)  { 
indexOf Space  =  userPair . indexOf ( ’  ’); 

userl  =  Integer .parseint (userPair . substring (0, 
indexOf Space) ) ; 

userJ  =  Integer .par seint (userPair , substring (indexOf Space  + 

D); 

arrayl .  add  ( new  Integer  (userl )  •)  ; 
arrayJ. add (new  Integer (userJ) ) ; 
if  (lowN  >  userl  +  userJ)  { 
lowN  =  userl  +  userJ; 

} 

System.out .print ("Enter  another  pair  or  <RETURN>  to  begin 
compression:  ") ; 

userPair  =  inUser. readLine () ; 

} 

System.out .print ("Enter  an  offset  bound  (0  to  ”  +  (lowN  -  1)  + 

" )  :  ” )  ; 

int  offsetBound  =  Integer .parseint (inUser . readLine ()) ; 
if  (offsetBound  <  0)  { 

offsetBound  =  0; 

}  else  if  (offsetBound  >  lowN  -  1)  { 

offsetBound  =  lowN  -  1; 

} 

boolean  hufCompOn  =  false; 

System. out. print ("Turn  Huffman  comparisons  ON  (y/n) ?  "); 
if  (inUser. readLine 0 . egualsIgnoreCase { "y” ) )  { 
hufCompOn  =  true; 

} 

System. out. println("\nPROGRESS: ") ; 

for  (int  a  =  0;  a  <  sourceFiles . length  *  arrayl . size ()  * 
(offsetBound  +  1) ;  a++)  { 

Sy s  tern . out . print ("."); 

} 

System. out. printlnO  ; 
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for  (int  a  =  0;  a  <  sourceFiles . length;  a++)  { 

for  (int  b  =  0;  b  <  arrayl . size () ;  b++)  { 

for  (int  c  =  0;  c  <=  offsetBound;  C++)  { 

userl  =  ( (Integer) arrayl .get (b) ) . intValue () ; 
userJ  =  ( (Integer) arrayJ. get (b) ). intValue 0 ; 
BitStringReader  bsr  =  new  BitStringReader (szInDir  + 

sourceFiles  [a] ) ; 

BitStringWriter  bsw  =  new  BitStringWriter ( 

szOutDir  +  targetFiles  [a]  +  userl  +  +  userJ 

+  M_M  +  c  +  ".MTC"  + 

(mtcMethod  -  1) ) ; 

StatsCollected  mcm; 
if  (mtcMethod  ==  3)  { 

mcm  =  new  MultitreeCompressionManager2 (bsr, 

bsw,  userl,  userJ,  c) ; 

}  else  if  (mtcMethod  ==  4)  { 

mcm  =  new  MultitreeCompressionManagerS (bsr, 

bsw,  userl,  userJ,  c) ; 

}  else  { 

mcm  =  new  MultitreeCompressionManager4 (bsr , 

bsw,  userl,  userJ,  c) ; 

} 

if  (hufCompOn)  { 

int  n  =  userl  +  userJ; 

BitStringReader  bsr2  =  new 
BitStringReader (szInDir  +  sourceFiles [a] ,  n) ; 

BitStringWriter  bsw2  =  new  BitStringWriter ( 

szOutDir  +  targetFiles  [a]  +  n  +  +  c  + 

" .HUF") ; 

Huf fmanCompressionManager2  hcm2  = 

new  Huf fmanCompressionManager2 (bsr2 ,  bsw2 , 

n,  c)  ; 

registerStats (new  Stats (mcm,  hcm2) ) ; 

}  else  { 

registerStats (new  Stats (mcm) ) ; 

} 

System. out .print (" . ") ; 

} 

} 

} 

System. out .print In ( ”\n" ) ; 

outputStats (mtcMethod  +  "  (indexed  trees)"); 

} 

public  static  final  void  registerStats (Stats  stats)  { 
registeredStats .add (stats) ; 

} 

public  static  void  outputStats (String  method) 
throws 

FileNotFoundException 

{ 
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SimpleDateFormat  formatter  =  new 
SimpleDateFormat ( ”D_hh_mm_ss ” ) ; 

Date  date  =  new  Date ( ) ; 

String  szDate  =  formatter . format (date) ; 

PrintStream  statsFile  *=  new  PrintStream( 

new  File0utputStream(s20utDir  +  "stats"  +  szDate  + 

" . txt " ) ) ; 

PrintStream  out  =  System. out; 

for  (int  i  =  0/  i  <  2/  i++)  { 

System. setOut (i  ==  0  ?  out  :  statsFile); 
System.out.println( "Compression’ method:  "  +  method); 
System. out .println( "Compressions  performed:  "  + 
registeredStats . size ( )  )  ; 

System,  out .print In  0 ; 

Iterator  it  =  registeredStats . iterator () ; 
while  (it .hasNext 0 )  { 

System.out .println(it .next () ) ; 

System .  out . println  (" - ")  ; 

System.out  .print In  {)  ; 

} 

} 

statsFile . close ()  ; 

System. setOut (out) ; 

} 

} 


package  thesis . compre ss ion. mult i_tree; 


public  class  StatsNotAvailableException  extends  Exception  { 
public  StatsNotAvailableException ( )  { 

super ( "StatsCollected  method  calls  out  of  sequence"); 

} 

public  StatsNotAvailableException (String  str)  { 
super (str) ; 

} 


} 


package  thesis . compression. multi_tree; 

import  java.util.*; 
import  java.io.*; 

/** 

*  Poorly  named  class  which  implements  the  Neuman  (not  John  VonNoyman) 
technique  of 
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*  building  a  Huffman  tree.  The  technique  represents  the  leaves  of  a 
Huffman  tree  using 

*  a  decimal  number.  For  example:  0.0235  indicates  that  the  tree  has 
0  leaves  at  level 

*  0,  0  leaves  at  level  1,  2  leaves  at  level  2,  3  leaves  at  level  3, 
and  5  leaves  at  level 

*  4.  Using  this  technique  a  Huffman  tree  is  built  from  a  frequency 
table  by  a 

*  series  of  decimal  shift  and  addition  opperations  which  seems  more 
convenient  than  the 

*  actual  construction  of  a  binary  tree. 

*/ 

public  class  VonNoymanNode  implements  Comparable  { 

public  long  freq; 
public  int[]  leaf sAtLevel ; 
public  Bitstring  index; 

/*  * 

*  Returns  the  decimal  digits  that  represent  the  leaves  at  each 
level  of  the 

*  Huffman  tree  built  from  the  frequency  table  represented  by 
f reqOrderedVNN . 

*  The  decimal  digits  are  returned  as  an  array  of  integers  as  each 
digit  may 

*  exceed  9  depending  upon  tHe  size  of  the  resultant  Huffman  tree. 

* 

*  PRECONDITION:  f reqOrderedVNN  is  not  empty 

*  POSTCONDITION:  f reqOrderedVNN  is  trashed 
*/ 

public  static  int []  vonNoymanAlgorithmInt (  /*  IN/OUT  */  TreeSet 
f reqOrderedVNN) 
throws 

NoSuchElementException 

{ 

//  special  case 

if  (f reqOrderedVNN. size  0  ==  1)  { 

return  new  int[]  {O,  l}; 

} 

//  PLC:  vnnl . leaf SAtLevel  contains  the  #  of  leafs  at  each 
level  of  Huffman  tree 

VonNoymanNode  vnnl ,  vnn2 ; 
do  { 

vnnl  =  (VonNoymanNode)  f reqOrderedVNN.  firstO; 

f reqOrderedVNN. remove (vnnl) ; 

if  (freqOrderedVNN. isEmpty 0 )  break; 

vnn2  =  (VonNoymanNode)  freqOrderedVNN.  f  irst  ()  ; 

freqOrderedVNN . remove (vnn2 ) ; 

vnnl . add ( vnn2 ) ; 

freqOrderedVNN. add (vnnl) ; 

}  while  (true) ; 

return  vnnl . leaf sAtLevel ; 

} 

j  itit 
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*  Identical  to  vonNoymanAlgorithmInt  (freqOrderedVNN)  except  that 
the  resultant 

*  list  of  decimal  digits  is  returned  as  a  Bitstring []  instead  of 
an  int  []  . 

*/ 

public  static  final  Bitstring []  vonNoymanAlgorithm(  /*  IN/OUT  */ 
TreeSet  f reqOrderedVNN) 
throws 

NoSuchElementException 

{ 

int  []  leafsAtLevel  =  vonNoymanAlgorithmInt  (f  reqOrderedVNN)  ; 
Bitstring  []  retValue  =  BitString.toBitStringArr(leafsAtLevel); 
Assertion. assert (leafsAtLevel. length  ==  retValue . length) ; 
return  retValue; 

} 

*  Returns  an  array  of  BitStrings  which  represents  the  Huffman 
codes  of  a  Huffman 

*  tree  built  using  leaf sAtLevelList .  The  codes  are  ordered  from 
shortest  to 

*  longest . 

*/ 

public  static  Bitstring []  getHuffmanCodes (Bitstring [] 
leaf sAtLevelList)  { 

//  need  one  Huffman  code  per  leaf  so  count  leaves 
int  numCodesNeeded  =  0; 

for  (int  i  =  0;  i  <  leaf sAtLevelList . length;  i++)  { 

numCodesNeeded  +=  leaf sAtLevelList [i] .bitPattern ( ) ; 

} 

Bitstring  []  retValue  =  new  Bitstring [numCodesNeeded] ; 
int  pos  =  0; 
int  value  =  0; 

for  (int  i  =  0;  i  <  leaf sAtLevelList . length;  i++)  { 

//  level 

for  (int  j  =  0;  j  <  leaf sAtLevelList [i] .bitPattern () ;  j++) 

{  //  leaf 

retValue [numCodesNeeded  -  i  -  pos]  =  new 
Bitstring ( (short) (i) ,  value) ; 

pos++; 
value ++ ; 

} 

value  <<=  1; 

} 

return  retValue; 

} 

/**  Constructor  */ 

public  VonNoymanNode  (Bitstring  index)  { 
freq  =  1; 

leafsAtLevel  =  new  int[]  {l}; 
this. index  =  index; 


} 
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J  -k-k 

*  Helper  method  for  VonNoymanAlgorithm  methods . 

*/ 

public  void  add (VonNoymanNode  rValue)  { 

//  add  freq  attributes 
freq  4-=  rValue  .  f  req; 

//  add  leafsAtLevel  attributes 
int[]  longer,  shorter; 

if  (leaf sAtLevel . length  >  rValue . leaf sAtLevel . length)  { 
longer  =  leafsAtLevel; 
shorter  =  rValue . leaf sAtLevel ; 

}  else  { 

longer  =  rValue . leaf sAtLevel ; 
shorter  =  leafsAtLevel; 

} 


int[]  temp  =  new  int [longer . length  +  1] ; 
temp[0]  =  0; 

for  (int  i  =  0;  i  <  longer . length;  i++)  { 

temp[i  +  1]  =  longer  [i]  +  (i  <  shorter . length  ?  shorter [i] 

0); 

} 

leafsAtLevel  =  temp; 

//  ignore  classindex  attribute  for  this  operation 

} 

public  final  void  incrFreqO  { 
f req++; 

} 

public  final  String  toStringO  { 

return  index. toSt ring ( )  +  "\t("  +  freq  + 

} 

public  final  boolean  equals (Object  ob j )  { 

if  (  (obj  1=  null)  ScSc  (obj  instanceof  VonNoymanNode))  { 
VonNoymanNode  vnn  =  (VonNoymanNode)  obj  ; 
return  freq  ==  vnn. freq  &&  index . equals (vnn . index) ; 

} 

return  false; 

} 

public  final  int  compareTo (VonNoymanNode  vnn)  { 
long  diff  =  freq  -  vnn. freq; 
if  (diff  >  0)  { 
return  1; 

}  else  if  (diff  <  0)  { 

return  -1; 

}  else  { 

return  index . compareTo (vnn . index) ; 

} 

} 
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public  final  int  compareTo (Object  obj)  { 
return  compareTo ( (VonNoymanNode) obj ) ; 

} 
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APPENDIX  B:  SUMMARY  OF  EMPIRICAL  DATA 


To  determine  the  performance  of  both  RTA  and  IT  A,  we  ran  comparisons  against 
standard  Huffman  encoding,  using  the  Canterbury  Corpus  [19]  and  the  well-known 
“Lenna”  photograph  [20].  The  corpus  is  a  suite  of  eleven  text-based  files,  commonly 
used  as  an  industry  benchmark  for  testing  lossless  compression  algorithms.  This 
collection  was  developed  in  1997  as  an  improved  version  of  the  older  Calgary  corpus. 
The  files  were  chosen  because  their  results  on  existing  compression  algorithms  are 
"typical",  and  so  it  is  hoped  that  they  will  be  representative  for  new  methods  of 
compression  as  well.  Lena  is  a  Tagged  Image  File  (TIF),  which  is  an  industry  benchmark 
for  image  compression.  The  files  that  comprise  The  Canterbury  Corpus  are: 


file 

Abbrev 

Cateeorv 

Size  in  bvtes 

alice29.txt 

text 

English  text 

152089 

asvoulik.txt 

play 

Shakespeare 

125179 

co.html 

html 

HTML  source 

24603 

fields.c 

Csrc 

C  source 

11150 

grammar.  IsD 

•list 

LISP  source 

3721 

kennedv.xls 

Excl 

Excel  Spreadsheet 

1029744 

lcetl0.txt 

tech 

Technical  writing 

426754 

Dlrabnl2.txt 

poem 

Poetry 

481861 

ptt5 

fax 

CCITT  test  set 

513216 

sum 

SPRC 

SPARC  Executable 

38240 

xargs.l 

man 

GNU  manual  page 

4227 

Table  15:  Canterbury  Corpus  Test  Suite 
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The  following  graphs  summarize  the  results  of  RTA,  IT  A,  and  Huffman  encoding 
against  the  test  suite  files  mentioned  above.  The  title  of  the  graph  identifies  the  file  and 
the  tabular  data  represents  the  optimal  compression  for  the  given  n.  The  IT  A  results  of 
each  graph  represent  the  optimal  combination  of  |iBits|  +  [jBitsI  =  n  for  each  file.  The 
RTA  results  are  do  not  extend  beyond  n  =  24  due  to  the  exponential  memory 
requirements  of  this  approach. 


Chart  2:  Compression  Results  for  Test  File  Fields.c 
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Chart  3:  Compression  Results  for  Test  File  plrabn.txt 


Chart  4:  Compression  Results  for  Test  File  icettxt 


Chart  13:  Compression  Results  for  Test  File  Iena.tif 
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APPENDIX  C:  MAPPING  OF  ASCII  KEYBOARD  CHARACTERS 

BYRTA 


ASCII 

Character 

Value 

space 

32 

! 

33 

•• 

34 

# 

35 

$ 

36 

% 

37 

& 

38 

39 

( 

40 

) 

41 

• 

42 

4- 

43 

, 

44 

- 

45 

46 

/ 

47 

0 

48 

1 

49 

2 

50 

51 

4 

52 

5 

53 

6 

54 

7 

55 

8 

56 

9 

57 

58 

59 

< 

60 

X 

61 

> 

62 

*> 

63 

e 

64 

A 

65 

B 

66 

C 

67 

D 

68 

E 

69 

F 

70 

G 

71 

H 

72 

1 

73 

J 

74 

K 

75 

L 

76 

M 

77 

N 

78 

O 

79 

Index 


Binary  Tree 

00100000  2 

00100001  7 

00100010  6 

00100011  6 

00100100  2 

00100101  6 

00100110  6 

00100111  5 

00101000  2 

00101001  2 

00101010  2 

00101011  6 

00101100  4 

00101101  4 

00101110  4 

00101111  4 

00110000  2 

00110001  2 

00110010  2 

00110011  2 

00110100  ,  2 
00110101  2 

00110110  2 

00110111  5 

00111000  2 

00111001  2 

00111010  2 

00111011  2 

00111100  2 

00111101  2 

00111110  2 

00111111  2 

01000000  1 

01000001  1 

01000010  1 

01000011  6 

01000100  1 

'  01000101  5 

01000110  5 

01000111  5 

01001000  1 

01001001  7 

01001010  4 

01001011  6 

01001100  4 

01001101  4 

01001110  4 

01001111  4 
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p 

80 

01010000 

1 

Q 

81 

01010001 

1 

R 

82 

01010010 

1 

S 

83 

01010011 

6 

T 

84 

01010100 

1 

U 

85 

01010101 

1 

V 

86 

01010110 

5 

w 

87 

01010111 

5 

X 

88 

01011000 

3 

Y 

89 

01011001 

3 

z 

90 

01011010 

3 

[ 

91 

01011011 

3 

\ 

92 

01011100 

3 

] 

93 

01011101 

3 

A 

94 

01011110 

3 

95 

01011111 

3 

96 

01100000 

1 

a 

97 

01100001 

1 

b 

98 

01100010 

1 

c 

99 

01100011 

6 

d 

100 

01100100 

1 

e 

101 

01100101 

1 

f 

102 

01100110 

1 

9 

103 

01100111 

5 

h 

104 

01101000 

1 

1 

105 

01101001 

1 

j 

106 

01101010 

1 

k 

107 

01101011 

6 

1 

108 

01101100 

1 

m 

109 

01101101 

1 

n 

110 

01101110 

4 

0 

111 

01101111 

4 

P 

112 

01110000 

1 

q 

113 

01110001 

1 

r 

114 

01110010 

1 

s 

115 

01110011 

1 

t 

116 

01110100 

1 

u 

117 

01110101 

1 

V 

118 

01110110 

1 

w 

119 

01110111 

1 

X 

120 

01111000 

1 

y 

121 

01111001 

1 

z 

122 

01111010 

1 

{ 

123 

01111011 

1 

1 

124 

01111100 

1 

} 

125 

01111101 

1 

- 

126 

01111110 

1 

Table  15:  ASCII  Mapping  by  RTA 
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APPENDIX  D:  SAMPLE  OUTPUT  FROM  ITA  COMPRESSION 

PROGRAM 


Compression  method:  4  (indexed  trees) 

Compressions  performed:  1 

FILENAME :  c:\My  Documents\filesToConipress\sum 

n  :  16 

iBits  :4 

jBits  :  12 

offset  :  0 

file  size:  37.344  KB 
header  size:  3719  bytes 
HUE  size  :  24.199  KB 
change  :-35.1% 

MTC  size  :  23.258  KB 
change  :  -37.6% 


tree 

strings 

leafs 

0 

39.0% 

892 

1 

2.5% 

61 

2 

6.7% 

222 

3 

16.0% 

221 

4 

4.1% 

175 

5 

2.7% 

110 

6 

13.0% 

282 

7 

8.5% 

166 

8 

2.9% 

33 

9 

1.5% 

47 

10 

0.5% 

46 

11 

0.3% 

17 

12 

0.3% 

13 

13 

1.2% 

30 

14 

0.4% 

18 

15 

0.5% 

56 

151 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


152 


APPENDIX  E:  RTA  COMPRESSION  FORMAT 


n 

5 

bits 

#  of  offset  (uncompressed)  bits  f, 

where  f  <  n 

5 

bits 

actual  offset  bits 

f 

bits 

#  of  remainder  (uncompressed)  bits 

r, 

where  r  <  n 

5 

bits 

actual  remainder  bits 

r 

bits 

class  tree  0 

bits  used  b  to  represent  #  leafs  L 

at 

each  level  of  Huffman  tree  5 

bits 

#  of  levels  V  in  Huffman  tree 

5 

bits 

leafs  in  level  v  -  1  of  Huffman  tree 

(least  freq) 

b 

bits 

leafs  in  level  v  -  2  of  Huffman  tree 

b 

bits 

leafs  in  level  0  of  Huffman  tree 

(most  freq) 

b 

bits 

level  V  -  1  class  (es) 

(longest  code) 

ceil (log2n) 

bits 

level  V  -  2  class  (es) 

ceil (log2n) 

bits 

level  0  class (es) 

(shortest  code) 

ceil (log2n) 

bits 

class  tree  1 


(same  as  above) 


class  tree  n  -  1 


(same  as  above) 


rotation  tree  n 


bits  used  b  to  represent  #  leafs  f  at 
#  of  levels  V  in  Huffman  tree 
leafs  in  level  v  -  1  of  Huffman  tree 
leafs  in  level  v  -  2  of  Huffman  tree 

leafs  in  level  0  of  Huffman  tree 

level  V  “  1  rotation (s) 
level  V  -  2  rotation (s) 

level  0  rotation (s) 


each  level  of  Huffman  tree  5 

5 

(least  freq)  b 

b 

bits 

bits 

bits 

bits 

(most  freq) 

b 

bits 

(longest  code) 

ceil (log2n) 
ceil (log2n) 

bits 

bits 

(shortest  code) 

ceil (log2n) 

bits 
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#  bitstrings  processed 

32  bits 

encoded 

rotations 

variable 

encoded 

class  index 

variable 

encoded 

rotations 

variable 

encoded 

class  index 

variable 

last  encoded  rotations 

variable 

last  encoded  class  index 

variable 
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APPENDIX  F:  ITA  COMPRESSION  FORMAT 


iBits 

5 

bits 

jBits 

5 

bits 

#  of  offset  (uncompressed)  bits  f,  where  f 

<  n 

5 

bits 

actual  offset  bits 

f 

bits 

#  of  remainder  (uncompressed)  bits  r,  where  r  <  n 

5 

bits 

actual  remainder  bits 

r 

bits 

index  tree  0 

#  of  non-empty  levels  v  in  Huffman  tree 

5 

bits 

first  non-empty  level  F 

5 

bits 

bits  used  b  to  represent  #  leafs  L  at  each 

level  of 

Huffman 

tree  5 

bits 

leafs  in  level  F  +  v  -  1  of  Huffman  tree 

(least  freq) 

b 

bits 

leafs  in  level  F  +  v  -  2  of  Huffman  tree 

b 

bits 

leafs  in  level  F  of  Huffman  tree 

(most  freq) 

b 

bits 

level  F  +  V  -  1  jWord 

(longest 

code) 

jBits 

bits 

level  F  +  V  -  2  jWord 

jBits 

bits 

level  F  jWord 

(shortest  code) 

jBits 

bits 

index  tree  1 

(same  as  above) 

index  tree  2^iBits  -  1 


(same  as  above) 


index 

tree  2'^iBits  (iTree) 

#  of  non-empty  levels  v  in  : 

Huffman 

tree 

5 

bits 

first 

non-empty  level  F 

5 

bits 

bits  used  b  to  represent  # 

leafs  L 

at  each 

level 

of  Huffman 

tree  5 

bits 

leafs 

in  level  F  +  v  -  1  of 

Huffman 

tree 

(least 

freq) 

b 

bits 

leafs 

in  level  F  +  v  -  2  of 

Huffman 

tree 

b 

bits 

leafs 

in  level  F  of  Huffman 

tree 

(most 

freq) 

b 

bits 

level 

F  +  V  -  1  iWord 

(longest  code) 

iBits 

bits 

level 

F  +  V  -  2  iWord 

iBits 

bits 

level 

F  iWord 

(shortest  code) 

iBits 

bits 
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#  bitstrings  processed 

32  bits 

encoded 

iWord 

variable 

encoded 

jWord 

variable 

encoded 

iWord 

variable 

encoded 

jWord 

variable 

last  encoded  iWord 

variable 

last  encoded  jWord 

variable 
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